开发者

Extract Text from URL

the problem is I need to extract just text content of a given URL. I should insist that I just need the text content. there is many methods on the internet which r开发者_Python百科eturn all content of a web page without any text separation.

I need the code in c# language.

thanks for any answer


Well You need to Use some Parsing Technique to get Text like , you can use Xpath, or Regular Expression for getting Text from giving Url's Html


With HTML there is no such thing as "just text". Text you see on a webpage is rendered according to how the markup is defined.

You could manually strip all HTML tags between the <body></body> tags, then you'd have something like all text on the page. This will be error-prone, however.

Most solutions you'll find online will opt for a regular expression (something like Regex.Replace(str, "<(.|\n)*?>", string.Empty);), but if you use that you're likely to shoot yourself in the foot one day.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜