c# - How to get text that has no tag with htmlAgilityPack -


i have html file below

  <div>    <div style="margin-left:0.5em;">   <div class="tiny" style="margin-bottom:0.5em;">   <b><span class="h3color tiny">this review from: </span>you meet</b>   </div>   if know ron kaufman ...   <br /><br />whether you're ceo....   <br /><br />written in distinctive, ...   <br /><br />my advice? don't 1 copy   <div style="padding-top: 10px; clear: both; width: 100%;"></div>   </div>    <div style="margin-left:0.5em;">   <div class="tiny" style="margin-bottom:0.5em;">   <b><span class="h3color tiny">this review from: </span>my review</b>   </div>   became fan of ron kaufman after reading earlier book of years ago...   <div style="padding-top: 10px; clear: both; width: 100%;"></div>   </div>    </div> 

i want review text doesnt have html tag. using below code now

  foreach (htmlnode divreview in doc.documentnode.selectnodes(@"//div[@style='margin-left:0.5em;']"))       {       if (divreview != null)           {   review.add(divreview.descendants("div").where(d => d.attributes.contains("style") &&   d.attributes["style"].value.contains("padding-top: 10px; clear: both; width: 100%;")).                                           select(d =>  d.previoussibling.innertext.trim()).singleordefault());             }        } 

which return "my advice? don't 1 copy", how can whole text?

update: if remove

"br"

tag htmlnode, still when use above code "my advice? don't 1 copy" part!!! comment?

i've updated code this:

var alltext = (reviewdiv.descendants("div")   .first(div => div.attributes["style"].value == "padding-top: 10px; clear: both; width: 100%;")   .selectnodes("./preceding-sibling::text()") ?? new htmlnodecollection(null))    .select(text => text.innertext); 

this should return ienumerable of strings text preceding div intricate style.

without having little more of surrounding html it's hard tell whether you're after. i'm guessing have selected div , that div direct parent of whole block of text (given reference reviewdiv). html sample doesn't seem contain piece of html, i'm making few assumptions here.

with following input:

<div><div class="tiny" style="margin-bottom:0.5em;"> <b><span class="h3color tiny">this review from: </span>you meet</b> </div> if know ron kaufman ... <br /><br />whether you're ceo.... <br /><br />written in distinctive, ... <br /><br />my advice? don't 1 copy <div style="padding-top: 10px; clear: both; width: 100%;"></div></div> 

it extracts this:

if know ron kaufman ...
whether you're ceo....
written in distinctive, ...
advice? don't 1 copy

to build single string used: string extractedtext = string.join("", alltext);


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

asp.net - Razor Page Hosted on IIS 6 Fails Every Morning -

c++ - wxwidget compiling on windows command prompt -