开发者

Using C# how do I get a list/array of all script tags (and their contents) on a webpage?

I am using HttpWebRequest to put a remote web page into a开发者_JAVA技巧 String and I want to make a list of all it's script tags (and their contents) for parsing.

What is the best method to do this?


The best method is to use an HTML parser such as the HTML Agilty Pack.

From the site:

It is a .NET code library that allows you to parse "out of the web" HTML files. The parser is very tolerant with "real world" malformed HTML. The object model is very similar to what proposes System.Xml, but for HTML documents (or streams).

Sample applications:

  • Page fixing or generation. You can fix a page the way you want, modify the DOM, add nodes, copy nodes, well... you name it.

  • Web scanners. You can easily get to img/src or a/hrefs with a bunch XPATH queries.

  • Web scrapers. You can easily scrap any existing web page into an RSS feed for example, with just an XSLT file serving as the binding. An example of this is provided.


Use an XML parser to get all the script tags with their content. Like this one: simple xml

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜