开发者

Replace a substring in a string except when the string is inside quotes

Regex dialect: Java

The problem: given a string, replace all occurrences of a substring inside it, except when these occurrences are inside quotes.

Example1:

string: "test substr 'test substr' substr"
substring: "substr"
replacement: "YYYY"
output: "test YYYY 'test substr' YYYY"

Example2:

string: "test sstr 'test sstr' sstr"
substring: "substr"
replacement: "YYYY"
output: "test sstr 'test sstr' sstr"

Example3:

string: "test 'test substr'"
substring: "substr"
replacement: "YYYY"
output: "test 'test substr'"
开发者_Python百科

This is my best try thus far:

Regex: ((?:[^']*'[^']+')*?[^']*?)substring
Replace: $1replacement

The problem with it is that it needs a substring outside the quotes after the last string within the quotes otherwise it doesn't work, so Example3 will fail (output: "test 'test YYYY'").

Many thanks for your help.


Here's a way:

public class Main {
    public static void main(String [] args) {

        String[] tests = {
                "test substr 'test substr' substr",
                "test sstr 'test sstr' sstr",
                "test 'test substr'"
        };

        String regex = "substr(?=([^']*'[^']*')*[^']*$)";

        for(String t : tests) {
            System.out.println(t.replaceAll(regex, "YYYY"));
        }
    }
}

prints:

test YYYY 'test substr' YYYY
test sstr 'test sstr' sstr
test 'test substr'

Note that this does not work if ' can be escaped with a \ for example.

A quick explanation:

The following: ([^']*'[^']*')* will match 0 or an even number of single quotes with non quotes in between, and [^']*$ matches any non-quotes and the end-of-string.

So, the complete regex substr(?=([^']*'[^']*')*[^']*$) matches any "substr" that has 0 or an even number of single quotes ahead of it, when looking all the way to the end-of-string!

Looking all the way to the end-of-string is the key here. If you wouldn't do that, the following "substr" would also be replaced:

aaa 'substr' bbb 'ccc ddd' eee
           ^     ^       ^
           |     |       |
           i     ii     iii

because it "sees" an even number of single quotes ahead of it (i and ii). You must force it to look at the entire string to the right of it (all the way to $)!

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜