开发者

Regex replace using a repetitive capture

I have a table like:

A | 1  
A | 2  
B | 1  
B | 2  
B | 3

I'm trying to transform it to look like this:

A开发者_运维问答 { 1 | 2 }  
B { 1 | 2 | 3 }

I've come up with this which will match correctly I just can't figure out how to get the repeated capture out.

(A|B)|(\d)(\r\n\1|(\d))*

UPDATE

I realize that this would be fairly trivial with some programming language, I was hoping to learn something more about regular expressions.


This is a Java code that perhaps may be helpful:

    String text =   "A | 1\n" +
                    "A | 2\n" +  
                    "B | 1\n" +
                    "B | 2\n" +
                    "B | 3\n" +
                    "A | x\n" +
                    "D | y\n" +
                    "D | z\n";
    String[] sections = text.split("(?<=(.) . .)\n(?!\\1)");
    StringBuilder sb = new StringBuilder();
    for (String section : sections) {
        sb.append(section.substring(0, 1) + " {")
          .append(section.substring(3).replaceAll("\n.", ""))
          .append(" }\n");
    }
    System.out.println(sb.toString());

This prints:

A { 1 | 2 }
B { 1 | 2 | 3 }
A { x }
D { y | z }

The idea is to to do this in two steps:

  • First, split into sections
  • Then transform each section

A single replaceAll variant

If you intersperse { and } in the input to be captured so they can be rearranged in the output, this is possible with a single replaceAll (i.e. an entirely regex solution)

String text =   "{ A | 1 }" +
                "{ A | 2 }" +
                "{ B | 1 }" + 
                "{ B | 2 }" +
                "{ B | 3 }" +
                "{ C | 4 }" +
                "{ D | 5 }";
System.out.println(
    text.replaceAll("(?=\\{ (.))(?<!(?=\\1).{7})(\\{)( )(.) .|(?=\\}. (.))(?:(?<=(?=\\5).{6}).{5}|(?<=(.))(.))", "$4$3$2$7$6")
);

This prints (see output on ideone.org):

A { 1 | 2 } B { 1 | 2 | 3 } C { 4 } D { 5 }

Unfortunately no, I don't think this is worth explaining. It's way too complicated for what's being accomplished. Essentially, though, lots of assertions, nested assertions, and capture groups (some of which will be empty strings depending on which assertion passes).

This is, without a doubt, the most complicated regex I've written.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜