开发者

c# Regex question

I have a problem dealing with the @ symbol in Regex, I am trying to remove @sometext from a text string can't seem to find anywhere where it uses the @ as a literal. I have tried myself but doesn't remove the word from the string. Any ideas?

public string removeAtSymbol(string input)
{
    Regex findWords = new Regex(______);//Find the words like "@text"
    Regex[] removeWords;

    string test = input; 
    MatchCollection all = findWords.Matches(test);
    removeWords = new Regex[all.Count];
    int index = 0;
    string[] values = new string[all.Count];

    YesOutputBox.Text = " you got here";

    foreach (Match m in all) //List all the words开发者_如何学运维
    {
        values[index] = m.Value.Trim();
        index++;
        YesOutputBox.Text = YesOutputBox.Text + " " + m.Value;
    }

    for (int i = 0; i < removeWords.Length; i++)
    {
        removeWords[i] = new Regex(" " + values[i]);

        // If the words appears more than one time
        if (removeWords[i].Matches(test).Count > 1)
        {
            removeWords[i] = new Regex(" " + values[i] + " ");
            test = removeWords[i].Replace(test, " "); //Remove the first word.
        }
    }

    return test;
}


You can remove all occurences of "@sometext" from string test via the method

Regex.Replace(test, "@sometext", "")

or for any word starting with "@" you can use

Regex.Replace(test, "@\\w+", "")

If you need specifically a separate word (i.e. nothing like @comp within tom@comp.com) you may preceed the regex with a special word boundary (\b does not work here):

Regex.Replace(test, "(^|\\W)@\\w+", "")


You can use:

^\s@([A-Za-z0-9_]+)

as the regex to recognize Twitter usernames.


Regex to remove @something from this string: I want to remove @something from this string.

var regex = new Regex("@\\w*");
string result = regex.Replace(stringWithAt, "");

Is that what you are looking for?


I've had good luck applying this pattern:

\B@\w+

This will match any string starting with an @ character that contains alphanumeric characters, plus some linking punctuation like the underscore character, if it does not occur on a boundary between alphanumeric and non-alphanumeric characters.

The result of executing this code:

string result = Regex.Replace(
    @"@This1 @That2_thing this2@3that @the5Others @alpha@beta@gamma",
    @"\B@\w+", 
    @"redacted");

is the following string:

redacted redacted this2@3that redacted redacted@beta@gamma

If this question is Twitter-specific, then Twitter provides an open source library that helps capture Twitter-specific entities like links, mentions and hashtags. This java file contains the code defining the regular expressions that Twitter uses, and this yml file contains test strings and expected outcomes of many unit tests that exercise the regular expressions in the Twitter library.

Twitter's mention-matching pattern (extracted from their library, modified to remove unnecessary capture groups, and edited to make sense in the context of a replacement) is shown below. The match should be performed in a case-insensitive manner.

(^|[^a-z0-9_])[@\uFF20][a-z0-9_]{1,20}

Here is an example which reproduces the results of the first replacement in my answer:

string result = Regex.Replace(
    @"@This1 @That2_thing this2@3that @the5Others @alpha@beta@gamma", 
    @"(^|[^a-z0-9_])[@\uFF20][a-z0-9_]{1,20}", 
    @"$1redacted",
    RegexOptions.IgnoreCase);

Note the need to include the substitution $1 since the first capture group can't be directly converted into an atomic zero-width assertion.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜