How to escape Unicode escapes in Groovy's /pattern/ syntax

2023-01-07 05:02 问答作者：

The following Groovy commands illustrate my problem.

First of all, this works (as seen on lotrepls.appspot.com) as expected (note that \u0061 is 'a').

>>> print "a".matches(/\u0061/)

true

Now let's say that we want to match \n, using the Unicode escape \u000A. The following, using "pattern" as a string, behaves as expected:

>>> print "\n".matches("\u000A");

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting anything but ''\n''; got it anyway
@ line 1, column 21. 1 error

This is expected because in Java at least, Unicode escapes are processed early (JLS 3.3), so:

print "\n".matches("\u000A")

really is the same as:

print "\n".matches("
")

The fix is to escape the Unicode escape, and let the regex engine process it, as follows:

>>> print "\n".matches("\\u000A")

true

Now here's the question part: how can we get this to work with the Groovy /pattern/ syntax instead of using string literal?

Here are some failed attempts:

>>> print "\n".matches(/\u000A/)开发者_开发问答

Interpreter exception: com.google.lotrepls.shared.InterpreterException:
org.codehaus.groovy.control.MultipleCompilationErrorsException: startup failed,
Script1.groovy: 1: expecting EOF, found '(' @ line 1, column 19.
1 error

>>> print "\n".matches(/\\u000A/)

false

>>> print "\\u000A".matches(/\\u000A/);

true

~"[\u0000-\u0008\u000B\u000C\u000E-\u001F\u007F-\u009F]"

Appears to be working as it should. According to the docs I've seen, the double backslashes shouldn't be required with a slashy string, so I don't know why the compiler's not happy with them.

Firstly, it seems Groovy changed in this regard in the meantime, at least on https://groovyconsole.appspot.com/ and a local Groovy shell, "\n".matches(/\u000A/) works perfectly fine, evaluating to true.

In case you have a similar situation again, just encode the backslash with a unicode escape like in "\n".matches(/\u005Cu000A/) as then the unicode escape to character conversion makes it a backslash again and then the sequence for the regex parser is kept.

Another option would be to separate the backslash from the u for example by using "\n".matches(/${'\\'}u000A/) or "\n".matches('\\' + /u000A/)

继续阅读：groovy unicode

How to escape Unicode escapes in Groovy's /pattern/ syntax

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？