开发者

C# Regex Replace Pattern (Replace String) Return $1

I'm currently working with parsing some data from SQL Server and I'm in need of help with a Regex.

I have an assembly in Sql Server 2005 that helps me Replace strings using C# Regex.Replace() Method.

I need to parse the following.

    Strings:

    CAD 90890

    (CAD 90892)

    CAD G67859

    CAD 34G56

    CAD 3S56.

    AX CAD 890990

    CAD 783783 MX

    Needed Results:

    90890

    90892

    G67859

    34G56

    3S56

    890990

    783783 

SELECT TOP 25 CADCODE, dbo.RegExReplace(CADCODE, '*pattern*', '$1')
FROM dbo.CADCODES
WHERE开发者_JAVA技巧 CADCODE LIKE '%CAD%'

I need to get the proceeding string after the CAD word until it hits a white-space or anything that not a number or digit. I managed to get the digits but it really fails on others. I'm trying to get it to work but I can't find a real solution.

Thanks in advance.

Updated to reflect new Strings

AX CAD 890990

CAD 783783 MX


Try this:

(\w+)\W*$

The pattern matches the last word - made of alphanumeric (and underscores).
Example: http://www.rubular.com/r/1zWQQVLZy1

Another option is to find a word with at least one digit - this one can match anywhere on the string, so you may need to handle multiple matches. In this case, you can add a capturing group around the whole pattern, or replace using $&.

[a-zA-Z_]*\d\w*

Example: http://www.rubular.com/r/XUrFNuPQUv

If you can't match (Regex.Match) and must use Regex.Replace, you can match the entire string start to end and replace it with the group you need:

RegExReplace(CADCODE, '^.*\b([a-zA-Z_]*\d\w*)\b.*$', '$1')


I think this is what you're after:

^\W*\w*CAD\w*\W*(\w+)\W*$

The regex has to match the whole string so RegExReplace can replace it with $1, effectively stripping off the unwanted parts.

EDIT: Let me back up and make sure I've got this right. Because of the

WHERE CADCODE LIKE '%CAD%'

in your query, you already know every string contains the sequence CAD. That being the case, there's no need to complicate the regex by matching that sequence again. This should be all you need:

^.*?(\w+)\W*$


Try this:

(?:\(CAD\)|CAD)\s+?([\dA-Z]+)

You can get the result from the capture group number 1.


The problem with regex is that it's always easy to get a good pattern if you have a limited sample set.

In your case, you use: \w{4}\w*

which just says, 4 alphanumerics, followed by 0 or more alphanumerics, so all the CAD sections would not match, nor would spaces or ().

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜