Web page(html) scraping using C#
This is just a general question. Currently I am doing webpage scraping using regex. But I think it is sometimes too difficult to figure out the regu开发者_Go百科lar expression, so I am thinking is XSL/XPath an alternative to regex in C#?
Also, I would like to know if there are more advanced techniques for webpage scraping other than the two listed above. Thanks.
You may take a look at SgmlReader or Html Agility Pack which are HTML parsing libraries for .NET.
Easy way to gather data from a web page is WebsiteParser. It's based on Html Agility Pack and you can simply describe your properties using attributes and CSS selectors.
Github here
精彩评论