开发者

Help for a regular expression

I am new to RegeX and need something that does the following: input: some word that starts with everything, follows by something fixed like "_CHR" and then immediately with some digit like 123 and then anything else. I want to find those and replace that number with the Character representation开发者_StackOverflow社区 of that number... so for example: input: " Hello Pi_CHR241to How are you" will be replaced with "Hello Piñto How are you"


Since you only want to replace the _CHRnnn bits, it's enough to search for:

_CHR(\d+)

After the match, backreference number 1 will contain the character code.

With this, you can:

string resultString = null;
resultString = Regex.Replace(subjectString, @"_CHR(\d+)", new MatchEvaluator(ComputeReplacement));

public String ComputeReplacement(Match m) {
    return ToString((char)(Int32.Parse(m.Groups[1].Value)));
}

EDIT: Thanks to Alan Moore for his C# expertise and regex improvement ideas!


Well, theoretically your string could start or end with one of your 'encoding' string, so I don't know that you want to worry about what is before or after it, but just find occurrences of any instance of that pattern.

Assuming you are looking for any unicode character, you could look either for up to 4 hexadecimal digits, or up to 5 decimal digits. So your RegEx could look something like the following:

HEX: _CHR[0-9A-F]{1,4}
DEC: _CHR[0-9]{1,5}

If you want to match either format, try something like the following:

_CHR([0-9A-F]{1,4}|[0-9]{1,5})

I don't think you can do the replacement you want with the RegEx directly. RegEx generally just does a straight replacement, where you are looking for a functional replacement. But whatever language you are coding in it should be easy enough to get a collection of the matches and loop through it, parsing them and replacing as needed.

EDIT: Regarding your matching question, I can't give specifics without knowing your language. But in pseudo-code, you'd do something along the lines of the following:

Dim pMatches = RegEx.Matches(myInput, myPattern)  
Const pfx As String  = "_CHAR", ccode As String  
For Each m As Match in pMatches  
    ccode = m.Value.Replace(pfx, "")
    myInput.Replace(m.Value, GetUniChar(ccode))  
Next

That is roughly VB.NET syntax, but you'd need to translate it as appropriate to whatever language you're using. If you need explanation of any of it, comment back.


The problem with that method is that if the text following the symbol is also a number, the regex pattern could easily mistake it for the rest of the symbol. You'd be much better off to use the html standard &#F1; instead. If you must use the method in the example, you'd have to use a fixed number of digits.

string resultString = null;
resultString = Regex.Replace(subjectString, @"&#([0-9A-Fa-f]+);", new MatchEvaluator(ComputeReplacement));
//alternateive: @"_CHR(\d{3})" - fixed number of digits
//or: @"_CHR(\d+)" - ABSOLUTELY SURE a number will never follow a special character

public string ComputeReplacement(Match m) {
    return ((char)(int.Parse(m.Groups[1].Value, NumberStyles.HexNumber))).ToString();
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜