Fetching an external page and parsing meta-tags without Regex in C#?

2023-01-29 07:20 问答作者：

Consider the following code:

public ActionResult Index(String URLQuery = "http://www.google.com")
    {

        HttpWebRequest webRequest;
        HttpWebResponse webResponse;

        int bufCount = 0;
        byte[] byteBuf = new byte[1024];
        String queryContent = "";


        webRequest = (HttpWebRequest) WebRequest.Create(URLQuery);
        webRequest.Timeout = 10*1000;
        webRequest.KeepAlive = false;
        webRequest.ContentType = "text/html";

        webResponse = (HttpWebResponse) webRequest.GetResponse();

        StreamReader responseStream = new StreamReader(webResponse.GetResponseStream(), System.Text.Encoding.UTF8);

        queryContent = responseStream.ReadToEnd();

        ViewData["StreamResult"] = queryContent;            
        return View();
    }

Essentially, this simply grabs a web page and spits it out as-is. What I'd like to do is take the resulting fetched data from the screen, and parse开发者_高级运维 it much like PHP allows you to do using some sort of built-in DOM object/framework. I have seen many examples of Regex to accomplish this task but I feel like that is inefficient and results in too many weird edge-cases that might result in corrupt data on my end.

Is this even possible? Am I doomed to use Regex for this?

You should use a parser for this - it looks like HTML agility pack will do what you want.

Using HtmlAgility Pack you can do this very easily. Below a sample using XPath, the newer version does support Linq syntax as well, but I haven't tried that yet personally.

    StreamReader responseStream = new StreamReader(webResponse.GetResponseStream(), 
                                                   System.Text.Encoding.UTF8);

        queryContent = responseStream.ReadToEnd();
        HtmlDocument doc = new HtmlDocument();
        doc.LoadHtml(queryContent);
        HtmlNode bodyNode = doc.DocumentNode.SelectSingleNode("//body | //BODY");
        /* do processing here */

继续阅读：httpwebrequest meta-tags

Fetching an external page and parsing meta-tags without Regex in C#?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？