Best practice to parse html (not XML) to XElement?
I have this code:
var url = textBox1.Text;
WebClient wc = new WebClient();
var page= wc.DownloadString(url);
XElement doc = XElement.Parse(page);
It 开发者_如何学Cfails with exception about unexpected characters. Obviously, the HTML i'm trying to parse in such a dumb way is not strict xml. What's the next easiest way to parse arbitrary HTML to something IQueriable?
What I actually want is to grab a table inside and paging links. Then parse them on my own with LINQ.
Have a look at the HTML Agility Pack:
http://www.codeplex.com/htmlagilitypack
The best way that I can think of is to search for the tags and parse everything inside, same for the tags containing the paging links. Hopefully narrowing it down to that should make a manual parser to write.
精彩评论