开发者

Java regex for any symbol?

Is there a regex which accepts any symbol?

EDIT: To clarify what I'm looking for.. I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 开发者_JAVA技巧1 symbol (e.g , . " ' $ £ etc.) or (not exclusive or) at least 1 character.


Yes. The dot (.) will match any symbol, at least if you use it in conjunction with Pattern.DOTALL flag (otherwise it won't match new-line characters). From the docs:

In dotall mode, the expression . matches any character, including a line terminator. By default this expression does not match line terminators.


Regarding your edit:

I want to build a regex which will accept ANY number of whitespaces and the it must contain atleast 1 symbol (e.g , . " ' $ £ etc.) or (not exclusive or) at least 1 character.

Here is a suggestion:

\s*\S+
  • \s* any number of whitespace characters
  • \S+ one or more ("at least one") non-whitespace character.


In Java, a symbol is \pS, which is not the same as punctuation characters, which are \pP.

I talk about this issue, plus enumerate the types for all the ASCII punctuation and symbols, here in this answer.

Patterns like [\p{Alnum}\s] only work on legacy dataset from the 1960s. To work on things with the Java native characters set, you needs something on the order of

identifier_charclass = "[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}[\\p{InEnclosedAlphanumerics}&&\\p{So}]]";
whitespace_charclass = "[\\u000A\\u000B\\u000C\\u000D\\u0020\\u0085\\u00A0\\u1680\\u180E\\u2000\\u2001\\u2002\\u2003\\u2004\\u2005\\u2006\\u2007\\u2008\\u2009\\u200A\\u2028\\u2029\\u202F\\u205F\\u3000]";

ident_or_white = "[" + identifier_charclass + whitespace_charclass + "]";

I’m sorry that Java makes it so difficult to work with modern dataset, but at least it is possible.

Just don’t ask about boundaries or grapheme clusters. For that, see my others posting.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜