C# - Convert HTML unordered list to JSON array
I'd like to convert an unordered list, which is stored as a string
into a JSON array.
The reason I need this is because I'm screen scraping a website (with permission) so all I've got is website source stored as a string
(yes, it's horrible) until they finish their API (and yes, they've agreed not to change any of their HTML in the process). :-)
HTML:
<ul class="column">
<li><a href="/view.php?m=48902&g=313433">Item 1</a></li>
<li><a href="/view.php?m=09844&g=313433">Item 2</a></li>
<li><a href="/view.php?m=23473&g=313433">Item 3</a></li>
</ul>
JSON:
{"items":[
{
id: 1,
url: "/view.php?m=48902&g=313433",
name: "Item 1",
m: 48902,
g: 313433
},
{
id: 2,
url: "/view.php?m=09844&g=313433",
name: "Item 2",
m: 09844,
g: 313433
},
{
id: 3,
url: "/view.php?m=23473&g=313433",
开发者_JAVA技巧 name: "Item 3",
m: 23473,
g: 313433
}
]}
Proposed approach:
Since you will be parsing HTML extensively, I recommend that you download HTMLAgilityPack and use it to parse your HTML. There is some sample code in the website. It also supports LINQ, so parsing the HTML should be relatively easy.
As far as converting to JSON, my advise is that you create a class with the structure you want; for example:
public class MyItem
{
public int id { get; set; }
public string url { get; set; }
public string name { get; set; }
public int g { get; set; }
public int m { get; set; }
}
Now that you have the structure ready as a class, you can build a List<MyItem>
with all the elements you parsed from your HTML.
The final step to convert to JSON is a matter of doing:
List<MyItem> list = .... the list constructed
JavascriptSerializer js = new JavascriptSerializer();
string jsonOutput = js.Serialize(list);
Since you're screen scraping, I would recommend using the Html Agility Pack to read the HTML (using XPath), then either use a JSON library such as JSON.net or use the JavaScriptSerializer
(System.Web.Script.Serialization.JavaScriptSerializer) class to serialize the HAP object into JSON.
精彩评论