开发者

Regex remove some text a a hyper link

click <a href="javascript:validate('http://www.google.com');">here</a> to open google.com

I need to replace the above sentence to the following:

click <a href="http://www.google.com">here</a> to open google.com
开发者_如何学Python

Please help me with the regular expression to do this in C#


 Regex regex = new Regex ("href\=\".+?'(.+)'", 
            RegexOptions.IgnoreCase);
        MatchCollection matches = regex.Matches(text);

then youll need to extract Group #1 :

matches .Groups[1]

and this is your new value to assign.


Here you go:

The Regex:

(?<=href\=")(javascript:validate\('(?<URL>[^"']*)'\);)

The Code:

string url = "click <a href=\"javascript:validate('http://www.google.com');\">here</a> to open google.com";
Regex regex = new Regex("(?<=href\\=\")javascript:validate\\('(?<URL>[^\"']*)'\\);");
string output = regex.Replace(url, "${URL}");

The Output:

click <a href="http://www.google.com">here</a> to open google.com


No Regex needed:

var s = 
    inputString.Replace(
        "javascript:validate('http://www.google.com');",
        "http://www.google.com" );


HtmlAgilityPack: http://htmlagilitypack.codeplex.com

This is the preferred method for parsing HTML.


Parsing the HTML as Austin suggested is a much more efficient way of doing this, but if you absolutely must use REGEX try something like this (referenced from MSDN System.Text.RegularExpressions Namespace):

using System;
using System.Text.RegularExpressions;

class MyClass
{
    static void Main(string[] args)
    {
        string pattern = @"<a href=\"[^\(]*\('([^']+)'\);\">";
        Regex r = new Regex(pattern, RegexOptions.IgnoreCase);
        string sInput = "click <a href=\"javascript:validate('http://www.google.com');\">here</a> to open google.com";

        MyClass c = new MyClass();

        // Assign the replace method to the MatchEvaluator delegate.
        MatchEvaluator myEvaluator = new MatchEvaluator(c.ReplaceCC);

        // Write out the original string.
        Console.WriteLine(sInput);

        // Replace matched characters using the delegate method.
        sInput = r.Replace(sInput, myEvaluator);

        // Write out the modified string.
        Console.WriteLine(sInput);
    }

    // Replace each Regex cc match
    public string ReplaceCC(Match m)
    {
        return "click <a href=\"" + m.Group[0] + "\">";
    }
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜