开发者

How can I match HTML elements with regular expressions?

The following codes does not work, I am trying to retrive TR strings from a HTML table. Is there any issue with this code or any other solution available?

public static List<string> GetTR(string Tr)
{
    List<string> trContents = new List<string>();

    string regexTR = @"<(tr|TR)[^<]+>((\s*?.*?)*?)<\/(tr|TR)>";

    MatchCollection tr_Matches = Regex.Matches(Tr, regexTR, RegexOptions.Singleline);
    foreach (Match match in tr_Matches)
    {
        trContents.Add(match.Value);
    }

    return trContents;
}

Sample input string is given below:

"<TR><TD noWrap align=left>abcd</TD><TD noWrap align=left>SPORT</TD><TD align=left>5AT</开发者_JAVA技巧TD></TR>"


Parsing HTML with regular expressions is asking for trouble.

Do the job properly using something like HTML Agility Pack.


I think this regular expression would be more appropriate:

<(tr|TR)[^>]*>.*<\/\1>


this regex matches your input string:

<(tr|TR)+>((\s*?.*?)*?)<\/(tr|TR)>

i removed "[^<]"... not sure why you need that. also, try to add a non-greedy match...

however, it is better to go with something like HTML Agility Pak (if you want to keep your sanity) :)


(<(tr|TR)[^<]*>)(.+)((<\(tr|TR)[^<]*>)
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜