regex problem with Ctrl-M
I want to macth the following:
boolean b = "\u000D".matches("\\cM");
but the compiler give me:
unclosed string literal
illegal character: \92
illegal character: \92
unclosed string literal
not a statement
why? that literal is not a valid unicode Ctrl-m 开发者_StackOverflowunicode code???
The problem of unclosed string literal
is because the \uXXXX
sequences are resolved before lexing. So
boolean b = "\u000D".matches("\\cM");
becomes
boolean b = "
".matches("\\cM");
which is invalid Java code. (yes it also means you could write String foo = \u0022\u0021\u0022;
and compiles correctly.)
If you write instead
boolean b = "\r".matches("\\cM"); // \r == \u000D
then the code works and return true
.
Haha !
This is a trap!
Java processes Unicode escapes before interpretation. So, it converts you code into:
boolean b = "
".matches("\\cM");
.. and so, it is definitely an error - incompleted string and so on.
This might be unrelated, but I wanted to remove Ctrl + m from a field in database (Vertica).
I used below function and it worked for me.
REGEXP_REPLACE(<column_name>,'[[:cntrl:]]')
精彩评论