Java - Search for words having more than 1 capital letter
Just need your help regarding a task to search in Java. I need to re开发者_Go百科ad a line from a file and make a list of all the words that have more than 1 capital letter in them.
For example if the line is : There are SeVen Planets In this UniverSe
The result should be : SeVen and UniverSe
I am able to read the line by splitting it into words but some how not able to use the correct regular expression to search for these words.
The following is a small example I used but it returns false although I think it should return true.
System.out.println("ThiS".matches("[A-Z]{2,}"));
Can anyone please have a look at this and suggest ways to achieve my result? Appreciate any help.
Thanks,
AJ
[A-Z]{2,}
means 2 or more consecutive upper case letters. You could use [A-Z].*[A-Z]
which would allow for any other characters to appear between the two uppercase letters.
Alternatively, you don't really need to use regex for this. If you prefer you could just iterate over each character in the string and use Character.isUpperCase
and count the number of matching characters.
Maybe [a-z]*[A-Z][a-z]*[A-Z][a-z]*
can work.. the fact is that counting with {..}
doesn't allow chars between the two letters.
\b(?:[a-z]*[A-Z]){2}[a-z]*\b
will match words that contain at least two uppercase letters.
If you want to allow words that contain other letters than ASCII, use
\b(?:\p{Ll}*\p{Lu}){2}\p{Ll}*\b
Of course, in a Java string, you need to escape (double) the backslashes.
So you get:
Pattern regex = Pattern.compile("\\b(?:\\p{Ll}*\\p{Lu}){2}\\p{Ll}*\\b");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
// matched text: regexMatcher.group()
// match start: regexMatcher.start()
// match end: regexMatcher.end()
}
The regular expression you listed is not going to work because it will search for a contiguous sequence of 2 or more upper case letters.
I think what you need to do is to write an expression that lets you allow lowercase letters on both sides.
I don't remember the exact syntax (I'm going to check) but something like .*[A-Z].*[A-Z].*
will ensure that you have two upper cases
Your current regular expression matches only a sequence of two or more upper case letters, not multiples spread throughout the word. So, you would match THis
and tHIS
but not ThiS
as you have discovered.
You need to look for an upper case letter, maybe some lower case, and then another upper. Or in regex: [A-Z]\w*?[A-Z]
If you want to search the whole string without needing to split it first, then include the possibility of other word characters on either end and let the expression capture: (\w*?[A-Z]\w*?[A-Z]\w*)
Also note that we are using reluctant quantifiers so that they stop matching at the earliest opportunity in the first two instances, and the normal (greedy) quantifier at the end to pick up the rest of the word. Read more about the various quantifiers here.
Pattern pat = Pattern.compile("\\w*[A-Z]\\w*[A-Z]\\w*");
Matcher matcher = pat.matcher("There are SeVen Planets In this UniverSe");
while ( matcher.find() ) {
System.out.println(matcher.group());
}
Prints
SeVen
UniverSe
I'm horrible with regex though so there's probably a simpler way. This way's really easy to understand though: start at the beginning of a word, match 0 or more characters, then an upper-case character, then 0 or more characters, then another upper-case character, then 0 or more characters.
i use this regex /[A-Z].[A-Z]+/
You can use this regex:
"SeVen".matches("[A-Z].[A-Z][a-zA-Z]") //true
"SeveNEight".matches("[A-Z].[A-Z][a-zA-Z]") //true
"seVeneight".matches("[A-Z].[A-Z][a-zA-Z]") //false
精彩评论