c# - How to get text that has no tag with htmlAgilityPack -
i have html file below
<div> <div style="margin-left:0.5em;"> <div class="tiny" style="margin-bottom:0.5em;"> <b><span class="h3color tiny">this review from: </span>you meet</b> </div> if know ron kaufman ... <br /><br />whether you're ceo.... <br /><br />written in distinctive, ... <br /><br />my advice? don't 1 copy <div style="padding-top: 10px; clear: both; width: 100%;"></div> </div> <div style="margin-left:0.5em;"> <div class="tiny" style="margin-bottom:0.5em;"> <b><span class="h3color tiny">this review from: </span>my review</b> </div> became fan of ron kaufman after reading earlier book of years ago... <div style="padding-top: 10px; clear: both; width: 100%;"></div> </div> </div>
i want review text doesnt have html tag. using below code now
foreach (htmlnode divreview in doc.documentnode.selectnodes(@"//div[@style='margin-left:0.5em;']")) { if (divreview != null) { review.add(divreview.descendants("div").where(d => d.attributes.contains("style") && d.attributes["style"].value.contains("padding-top: 10px; clear: both; width: 100%;")). select(d => d.previoussibling.innertext.trim()).singleordefault()); } }
which return "my advice? don't 1 copy", how can whole text?
update: if remove
"br"
tag htmlnode, still when use above code "my advice? don't 1 copy" part!!! comment?
i've updated code this:
var alltext = (reviewdiv.descendants("div") .first(div => div.attributes["style"].value == "padding-top: 10px; clear: both; width: 100%;") .selectnodes("./preceding-sibling::text()") ?? new htmlnodecollection(null)) .select(text => text.innertext);
this should return ienumerable of strings text preceding div intricate style.
without having little more of surrounding html it's hard tell whether you're after. i'm guessing have selected div , that div direct parent of whole block of text (given reference reviewdiv). html sample doesn't seem contain piece of html, i'm making few assumptions here.
with following input:
<div><div class="tiny" style="margin-bottom:0.5em;"> <b><span class="h3color tiny">this review from: </span>you meet</b> </div> if know ron kaufman ... <br /><br />whether you're ceo.... <br /><br />written in distinctive, ... <br /><br />my advice? don't 1 copy <div style="padding-top: 10px; clear: both; width: 100%;"></div></div>
it extracts this:
if know ron kaufman ...
whether you're ceo....
written in distinctive, ...
advice? don't 1 copy
to build single string used: string extractedtext = string.join("", alltext);
Comments
Post a Comment