Parsing HTML - How to get a number from a tag?
I am developing a Windows Forms application which is interacting with a web site.
Using a WebBrowser
control I am controlling the web site and I can iterate through the tags using:
HtmlDocument webDoc1 = this.webBrowser1.Document;
HtmlElementCollection aTags = webDoc1.GetElementsByTagName("a");
Now, I want to get a particular text from the tag which is below:
<a href="issue?status=-1,1,2,3,4,5,6,7&@sort=-acti开发者_运维问答vity&@search_text=&@dispname=Show Assigned&@filter=status,assignedto&@group=priority&@columns=id,activity,title,creator,status&assignedto=244&@pagesize=50&@startwith=0">Show Assigned</a><br>
Like here I want to get the number 244 which is equal to assignedto
in above tag and save it into a variable for further use.
How can I do this?
You can try splitting a string by ';' values, and then each string by '=' like this:
string aTag = ...;
foreach(var splitted in aTag.Split(';'))
{
if(splitted.Contains("="))
{
var leftSide = splitted.Split('=')[0];
var rightSide = splitted.Split('=')[1];
if(leftSide == "assignedto")
{
MessageBox.Show(rightSide); //It should be 244
//Or...
int num = int.Parse(rightSide);
}
}
}
Other option is to use Regexes, which you can test here: www.regextester.com. And some more info on regexes: http://msdn.microsoft.com/en-us/library/system.text.regularexpressions.regex.aspx
Hope it helps!
If all cases are similar to this and you don't mind a reference to System.Web
in your Windows Forms application, tou can do something like this:
using System;
public class Program
{
static void Main()
{
string href = @"issue?status=-1,1,2,3,4,5,6,7&
@sort=-activity&@search_text=&@dispname=Show Assigned&
@filter=status,assignedto&@group=priority&
@columns=id,activity,title,creator,status&assignedto=244&
@pagesize=50&@startwith=0";
href = System.Web.HttpUtility.HtmlDecode(href);
var querystring = System.Web.HttpUtility.ParseQueryString(href);
Console.WriteLine(querystring["assignedto"]);
}
}
This is a simplified example and first you need to extract the href
attribute text, but that should not be complex. Having the href
attribute text you can take advantage that is basically a querystring and reuse code in .NET that already parses query strings.
To complete the example, to obtain the href
attribute text you could do:
HtmlElementCollection aTags = webBrowser.Document.GetElementsByTagName("a");
foreach (HtmlElement element in aTags)
{
string href = element.GetAttribute("href");
}
精彩评论