Inside a innermost tag. How to get all the formating operations effective on the text?
My requirement is to get the news content from different news websites..approximately...250. so news content is somewhere in the body, i can go to the first paragraph of where ever the news content is based on the google snippets/metainfo. but to get the other paragraphs of the news content i am trying to go up the HTML tree till i find a division or a table body...but because of that i am getting some undesired text i.e is not related to the news item. so what i found out is...all the relevant news items in most of the webpages are styled or formatted in a similar way. So is there a way i can capture all the styling happening in the first pa开发者_StackOverflowragraph and then i can filter out unwanted text using that formating information.
I am using HTML agility pack and xpath for my requirement. Thank you.
You could like at my answer of the following question on SO: Advanced HTML Agility Pack usage, it seems to be somewhat related to yours.
精彩评论