开发者

Parse HTML doc with HtmlAgilityPack-Xpath, RegExp

I try parse image url from html with HtmlAgilityPack. In html doc I have img tag :

<a class="css_foto" href="" title="Fotka: MyKe015">
   <span>
      <img src="http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6" 
           width="176" height="216" alt="Fotka: MyKe015" />
   </span>
</a>

I need get from this img tag atribute src. I need this: http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6.

I know this:

  1. Src atribute consist url, url start with http://213.215.107.125/fotky
  2. I know value of alt atribute Url have variable lenght and also html do开发者_运维技巧c consist other img tags with url, which start with http://213.215.107.125/fotky
  3. I know alt attribute of img tag (Fotka: Myke015))

Any advance, I try many ways, but nothing works good.

Last I try this:

    List<string> src;

    var req = (HttpWebRequest)WebRequest.Create("http://pokec.azet.sk/myke015");
    req.Method = "GET";

    using (WebResponse odpoved = req.GetResponse())
    {
        var htmlDoc = new HtmlAgilityPack.HtmlDocument();
        htmlDoc.Load(odpoved.GetResponseStream());

        var nodes = htmlDoc.DocumentNode.SelectNodes("//img[@src]");
        src = new List<string>(nodes.Count);

        if (nodes != null)
        {
            foreach (var node in nodes)
            {
                if (node.Id != null)
                    src.Add(node.Id);
            }
        }
    }


Your XPath selects the img nodes, not the src attributes belonging to them.

Instead of (selecting all image tags that have a src attribute):

var nodes = htmlDoc.DocumentNode.SelectNodes("//img[@src]");

Use this (select the src attributes that are child nodes of all img elements):

var nodes = htmlDoc.DocumentNode.SelectNodes("//img/@src");


This XPath 1.0 expression:

//a[@alt='Fotka: MyKe015']/@src
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜