开发者

Regex expression to back reference more than 9 values in a replace

I have a regex expression that traverses a string and pulls out 40 values, it looks sort if like the query below, but much larger and more complicated

est(.*)/test>test>(.*)<test><test>(.*)test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test><test>(.*)/test>

My question is how do I use these expressions with the replace c开发者_如何学JAVAommand when the number exceeds 9. It seems as if whenever I use \10 it returns the value for \1 and then appends a 0 to the end.

Any help would be much appreciated thanks :)

Also I am using UEStudio, but if a different program does it better then no biggie :)


As pointed out by psycho brm: Use $10 instead of \10 I am using notepad++ and it works beautifull.


Most of the simple Regex engines used by editors aren't equipped to handle more than 10 matching groups; it doesn't seem like UltraEdit can. I just tried Notepad++ and it won't even match a regex with 10 groups.

Your best bet, I think, is to write something fast in a quick language with a decent regex parser. but that wouldn't answer the question as asked

Here's something in Python:

import re

pattern = re.compile('(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)')
with open('input.txt', 'r') as f:
    for line in f:
        m = pattern.match(line)
        print m.groups()

Note that Python allows backreferences such as \20: in order to have a backreference to group 2 followed by a literal 0, you need to use \g<2>0, which is unambiguous.

Edit: Most flavors of regex, and editors which include a regex engine, should follow the replace syntax as follows:

abcdefghijklmnop
search: (.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(?<name>.)(.)
note:    1  2  3  4  5  6  7  8  9  10 11 12 13
value:   a  b  c  d  e  f  g  h  i  j  k  l  m
replace result:
    \11      k1      i.e.: match 1, then the character "1"
    ${12}    l       most should support this
    ${name}  l       few support named references, but use them where you can.

Named references are usually only possible in very specific flavor of regex libraries, test your tool to know for sure.


put a $ in front of the double digit subgroup: e.g. \1\2\3\4\5\6\7\8\9$10 It worked for me.


Try using named groups; so instead of the tenth:

(.*)

use:

(?<group10>.*)

and then use the following replace string:

${group10}

(That's of course in the absence of a better solution using looping, and remember that there might be different regex syntax flavours depending on your environment.)


If you cannot handle more than 9 subgroups why not initially match groups of 9 and then loop and apply regexes to those matches?

i.e. first match (<test.*/test>)+ and then for each subgroup match on <test(.*)/test>.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜