How to extract the html tag attribute?
I'm trying to develop my first RSS News Aggregator. I can easily extract the links, titles, publication date from the RSSItem Object. However, I'm having a hard time extracting the image from the feed Item. Unfortunately, due to my low reputation of SO I can't upload images, so instead of helping me extract the value of a src attribute of <img>
, can u please s开发者_如何学编程how me how to get the value of the href attr of <a>
tag. Highly appreaciated!!
Here's the string
<div style="text-align: center;"
<a href="http://www.engadget.com/2011/07/10/element5s-mini-l-solarbag-brings-eco-friendly-energy-protectio/"></a>
</div>
Edit:
Maybe the whole title is wrong. Is there a way I can find the value using XPath?
Use HTMLAgilityPack as answered in this post:
How can I get values from Html Tags?
More information:
Html may not be well formed, hence we need another parser (other than XML one supplied in .net) that is more fault tolerant. That's where HTMLAgilityPack comes in.
Getting started:
create a new console application
right-click on references / manage nuget packages (install NuGet if you don't have it).
add html agility
A working example:
using System;
using System.IO;
using System.Text;
using HtmlAgilityPack;
namespace ConsoleApplication4
{
class Program
{
private const string html =
@"<?xml version=""1.0"" encoding=""ISO-8859-1""?>
<div class='linkProduct' id='link' anattribute='abc'/>
<bookstore>
<book>
<title lang=""eng"">Harry Potter</title>
<price>29.99</price>
</book>
<book>
<title lang=""eng"">Learning XML</title>
<price>39.95</price>
</book>
</bookstore>
";
static void Main(string[] args)
{
HtmlDocument doc = new HtmlDocument();
byte[] byteArray = Encoding.ASCII.GetBytes(html); MemoryStream stream = new MemoryStream(byteArray);
var ts = new MemoryStream(byteArray);
doc.Load(ts);
var root = doc.DocumentNode;
var tag = root.SelectSingleNode("/div");
var attrib = tag.Attributes["anattribute"];
Console.WriteLine(attrib.Value);
}
}
}
Taking it further:
Get good at XPaths. Here's a good place to start.
http://www.w3schools.com/xpath/xpath_syntax.asp
精彩评论