开发者

Need some quick C# regex help

I have this html:

<a hre开发者_JAVA百科f="http://www.site.com/">This is the content.</a>

I just need to get rid of the anchor tag html around the content text, so that all I end up with is "This is the content".

Can I do this using Regex.Replace?


Your regex: <a[^>]+?>(.*?)</a>

Check this Regex with the Regex-class and iterate through the result collection and you should get your inner text.

String text = "<a href=\"link.php\">test</a>";

Regex rx = new Regex("<a[^>]+?>(.*?)</a>");
// Find matches.
MatchCollection matches = rx.Matches(text);

// Report the number of matches found.
Console.WriteLine("{0} matches found. \n", matches.Count);

// Report on each match.
foreach (Match match in matches)
{
    Console.WriteLine(match.Value);

    Console.WriteLine("Groups:");
    foreach (var g in match.Groups)
    {
        Console.WriteLine(g.ToString());
    }
}

Console.ReadLine();

Output:

  1 matches found. 
  <a href=\"link.php\">test</a> 
  Groups:
  <a href=\"link.php\">test</a> 
  test

The match expression in () is stored in the second item of match's Groups collection (the first item is the whole match itself). Each expression in () gets into the Groups collection. See the MSDN for further information.


If you had to use Replace, this'd work for simple string content inside the tag:

Regex r = new Regex("<[^>]+>");
string result = r.Replace(@"<a href=""http://www.site.com/"">This is the content.</a>", "");
Console.WriteLine("Result = \"{0}\"", result);

Good luck


You could also use groups in Regex.

For example, the following would give you the content of any tag.

      Regex r = new Regex(@"<a.*>(.*)</a>"); 
      // Regex r = new Regex(@"<.*>(.*)</.*>"); or any kind of tag

        var m = r.Match(@"<a href=""http://www.site.com/"">This is the content.</a>");

        string content = m.Groups[1].Value;

you use groups in regexes by using the parenthesis, although group 0 is the whole match, not just the group.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜