Parse HTML doc with HtmlAgilityPack-Xpath, RegExp
I try parse image url from html with HtmlAgilityPack. In html doc I have img tag :
<a class="css_foto" href="" title="Fotka: MyKe015">
<span>
<img src="http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6"
width="176" height="216" alt="Fotka: MyKe015" />
</span>
</a>
I need get from this img tag atribute src. I need this: http://213.215.107.125/fotky/1358/93/v_13589304.jpg?v=6.
I know this:
- Src atribute consist url, url start with http://213.215.107.125/fotky
- I know value of alt atribute Url have variable lenght and also html do开发者_运维技巧c consist other img tags with url, which start with http://213.215.107.125/fotky
- I know alt attribute of img tag (Fotka: Myke015))
Any advance, I try many ways, but nothing works good.
Last I try this:
List<string> src;
var req = (HttpWebRequest)WebRequest.Create("http://pokec.azet.sk/myke015");
req.Method = "GET";
using (WebResponse odpoved = req.GetResponse())
{
var htmlDoc = new HtmlAgilityPack.HtmlDocument();
htmlDoc.Load(odpoved.GetResponseStream());
var nodes = htmlDoc.DocumentNode.SelectNodes("//img[@src]");
src = new List<string>(nodes.Count);
if (nodes != null)
{
foreach (var node in nodes)
{
if (node.Id != null)
src.Add(node.Id);
}
}
}
Your XPath selects the img
nodes, not the src
attributes belonging to them.
Instead of (selecting all image tags that have a src
attribute):
var nodes = htmlDoc.DocumentNode.SelectNodes("//img[@src]");
Use this (select the src
attributes that are child nodes of all img
elements):
var nodes = htmlDoc.DocumentNode.SelectNodes("//img/@src");
This XPath 1.0 expression:
//a[@alt='Fotka: MyKe015']/@src
精彩评论