Extract Text from URL
the problem is I need to extract just text content of a given URL. I should insist that I just need the text content. there is many methods on the internet which r开发者_Python百科eturn all content of a web page without any text separation.
I need the code in c# language.
thanks for any answer
Well You need to Use some Parsing Technique to get Text like , you can use Xpath, or Regular Expression for getting Text from giving Url's Html
With HTML there is no such thing as "just text". Text you see on a webpage is rendered according to how the markup is defined.
You could manually strip all HTML tags between the <body></body>
tags, then you'd have something like all text on the page. This will be error-prone, however.
Most solutions you'll find online will opt for a regular expression (something like Regex.Replace(str, "<(.|\n)*?>", string.Empty);
), but if you use that you're likely to shoot yourself in the foot one day.
精彩评论