Java literate text word parsing regexp

2023-03-01 07:21 问答作者：

Firstly I was happy with [A-Za-z]+ Now I need to parse words tha开发者_StackOverflowt end with the letter "s", but i should skip words that have 2 or more first letters in upper-case.

I try something like [\n\\ ][A-Za-z]{0,1}[a-z]*s[ \\.\\,\\?\\!\\:]+ but the first part of it [\n\\ ] for some reason doesn't see the beginning of the line.

here is the example

the text is Denis goeS to school every day!

but the only parsed word is goeS

Any Ideas?

What about

\b[A-Z]?[a-z]*x\b

the \b is a word boundary, I assume that what you wanted. the ? is the shorter form of {0,1}

Try this:

Pattern p = Pattern.compile("\\b([A-Z]?[a-z]*[sS])\\b");
Matcher m = p.matcher("Denis goeS to school every day!");
while(m.find())
{
  System.out.println( m.group(1) );
}

The regex matches every word that starts with anything but a whitespace or 2 upper case characters, only contains lower case characters in the middle and ends on either s or S.

In your example this would match Denis and goeS. If you want to only match upper case S change the expression to "\\b([A-Z]?[a-z]*[S])\\b" which woudl match goeS and GoeS but not GOeS, gOeSor goES.

继续阅读：parsing regex

Java literate text word parsing regexp

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？