开发者

Web page(html) scraping using C#

This is just a general question. Currently I am doing webpage scraping using regex. But I think it is sometimes too difficult to figure out the regu开发者_Go百科lar expression, so I am thinking is XSL/XPath an alternative to regex in C#?

Also, I would like to know if there are more advanced techniques for webpage scraping other than the two listed above. Thanks.


You may take a look at SgmlReader or Html Agility Pack which are HTML parsing libraries for .NET.


Easy way to gather data from a web page is WebsiteParser. It's based on Html Agility Pack and you can simply describe your properties using attributes and CSS selectors.

Github here

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜