Java literate text word parsing regexp
Firstly I was happy with [A-Za-z]+
Now I need to parse words tha开发者_StackOverflowt end with the letter "s", but i should skip words that have 2 or more first letters in upper-case.
[\n\\ ][A-Za-z]{0,1}[a-z]*s[ \\.\\,\\?\\!\\:]+
but the first part of it [\n\\ ]
for some reason doesn't see the beginning of the line.
here is the example
the text is Denis goeS to school every day! but the only parsed word is goeSAny Ideas?
What about
\b[A-Z]?[a-z]*x\b
the \b
is a word boundary, I assume that what you wanted. the ?
is the shorter form of {0,1}
Try this:
Pattern p = Pattern.compile("\\b([A-Z]?[a-z]*[sS])\\b");
Matcher m = p.matcher("Denis goeS to school every day!");
while(m.find())
{
System.out.println( m.group(1) );
}
The regex matches every word that starts with anything but a whitespace or 2 upper case characters, only contains lower case characters in the middle and ends on either s or S.
In your example this would match Denis
and goeS
. If you want to only match upper case S change the expression to "\\b([A-Z]?[a-z]*[S])\\b"
which woudl match goeS
and GoeS
but not GOeS
, gOeS
or goES
.
精彩评论