Backreferences Syntax in Replacement Strings (Why Dollar Sign?)

2023-01-01 05:56 问答作者：

In Java, and it seems in a few other languages, backreferences in the pattern are preceded by a backslash (e.g. \1, \2, \3, etc), but in a replacement string they preceded by a dollar sign (e.g. $1, $2, $3, and also $0).

Here's a snippet to illustrate:

System.out.println(
    "left-right".replaceAll("(.*)-(.*)", "\\2-\\1") // WRONG!!!
); // prints "2-1"

System.out.println(
    "left-right".replaceAll("(.*)-(.*)", "$2-$1")   // CORRECT!
); // prints "right-left"

System.out.println(
    "You want million dollar?!?".replaceAll("(\\w*) dollar", "US\\$ $1")
); // prints "You want US$ million?!?"

System.out.print开发者_JAVA百科ln(
    "You want million dollar?!?".replaceAll("(\\w*) dollar", "US$ \\1")
); // throws IllegalArgumentException: Illegal group reference

Questions:

Is the use of $ for backreferences in replacement strings unique to Java? If not, what language started it? What flavors use it and what don't?
Why is this a good idea? Why not stick to the same pattern syntax? Wouldn't that lead to a more cohesive and an easier to learn language?
- Wouldn't the syntax be more streamlined if statements 1 and 4 in the above were the "correct" ones instead of 2 and 3?

Is the use of $ for backreferences in replacement strings unique to Java?

No. Perl uses it, and Perl certainly predates Java's Pattern class. Java's regex support is explicitly described in terms of Perl regexes.

For example: http://perldoc.perl.org/perlrequick.html#Search-and-replace

Why is this a good idea?

Well obviously you don't think it is a good idea! But one reason that it is a good idea is to make Java search/replace support (more) compatible with Perl's.

There is another possible reason why $ might have been viewed as a better choice than \. That is that \ has to be written as \\ in a Java String literal.

But all of this is pure speculation. None of us were in the room when the design decisions were made. And ultimately it doesn't really matter why they designed the replacement String syntax that way. The decisions have been made and set in concrete, and any further discussion is purely academic ... unless you just happen to be designing a new language or a new regex library for Java.

After doing some research, I've understood the issues now: Perl had to use a different symbol for pattern backreferences and replacement backreferences, and while java.util.regex.* doesn't have to follow suit, it chooses to, not for a technical but rather traditional reason.

On the Perl side

(Please keep in mind that all I know about Perl at this point comes from reading Wikipedia articles, so feel free to correct any mistakes I may have made)

The reason why it had to be done this way in Perl is the following:

Perl uses $ as a sigil (i.e. a symbol attached to variable name).
Perl string literals are variable interpolated.
Perl regex actually captures groups as variables $1, $2, etc.

Thus, because of the way Perl is interpreted and how its regex engine works, a preceding slash for backreferences (e.g. \1) in the pattern must be used, because if the sigil $ is used instead (e.g. $1), it would cause unintended variable interpolation into the pattern.

The replacement string, due to how it works in Perl, is evaluated within the context of every match. It is most natural for Perl to use variable interpolation here, so the regex engine captures groups into variables $1, $2, etc, to make this work seamlessly with the rest of the language.

References

Wikipedia/String literal - variable interpolation
Wikipedia/Sigil (computer programming)

On the Java side

Java is a very different language than Perl, but most importantly here is that there is no variable interpolation. Moreover, replaceAll is a method call, and as with all method calls in Java, arguments are evaluated once, prior to the method invoked.

Thus, variable interpolation feature by itself is not enough, since in essence the replacement string must be re-evaluated on every match, and that's just not the semantics of method calls in Java. A variable-interpolated replacement string that is evaluated before the replaceAll is even invoked is practically useless; the interpolation needs to happen during the method, on every match.

Since that is not the semantics of Java language, replaceAll must do this "just-in-time" interpolation manually. As such, there is absolutely no technical reason why $ is the escape symbol for backreferences in replacement strings. It could've very well been the \. Conversely, backreferences in the pattern could also have been escaped with $ instead of \, and it would've still worked just as fine technically.

The reason Java does regex the way it does is purely traditional: it's simply following the precedent set by Perl.

继续阅读：backreference regex replace syntax

Backreferences Syntax in Replacement Strings (Why Dollar Sign?)

On the Perl side

References

On the Java side

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

On the Perl side

References

On the Java side

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？