Java regex mix two patterns
How can i get this pattern to work:
Pattern pattern = Pattern.compile("[\\p{P}\\p{Z}]");
Basically, this will split my String[] sentence by any kind of punctuation character (p{P} or any kind of whitespace (p{Z}). But i want 开发者_如何学Cto exclude the following case:
(?<![A-Za-z-])[A-Za-z]+(?:-[A-Za-z]+){1,}(?![A-Za-z-])
pattern explained here: Java regex patterns
which are the hyphened words like this: "aaa-bb", "aaa-bb-cc", "aaa-bb-c-dd". SO, i can i do that?
Unfortunately it seems like you can't merge both expressions, at least as far as I know.
However, maybe you can reformulate your problem.
If, for example, you want to split between words (which can contain hyphens), try this expression:
(?>[^\p{L}-]+|-[^\p{L}]+|^-|-$)
This should match any sequence of non-letter characters that are not a minus or any minus that is followed my a non-letter character or that is the first or last character in the input.
Using this expression for a split should result in this:
input="aaa-bb, aaa-bb-cc, aaa-bb-c-dd,no--match,--foo"
ouput={"aaa-bb","aaa-bb-cc","aaa-bb-c-dd","no","match","","foo"}
The regex might need some additional optimization but it is a start.
Edit: This expression should get rid of the empty string in the split:
(?>[^\p{L}-][^\p{L}]*|-[^\p{L}]+|^-|-$)
The first part would now read as "any non-character which is not a minus followed by any number of non-character characters" and should match .--
as well.
Edit: in case you want to match words that could potentially contain hyphens, try this expression:
(?>(?<=[^-\p{L}])|^)\p{L}+(?:-\p{L}+)*(?>(?=[^-\p{L}])|$)
This means "any sequence of letters (\p{L}+
) followed by any number of sequences consisting of one minus and at least one more letters ((?:-\p{L}+)*+
). That sequence must be preceeded by either the start or anything not a letter or minus ((?>(?<=[^-\p{L}])|^)
) and be followed by anything that is not a letter or minus or the end of the input ((?>(?=[^-\p{L}])|$)
)".
精彩评论