Count words, java
I want to count words. I use the methods hasNextChar and getChar. The sentence may contain all kind of chars. Here's my code:
boolean isWord = false;
while(hasNextChar()){
char current = getChar();
switch(current){
开发者_如何学运维 case ' ' : case '.' : case ',' : case '-' :
isWord = false;
default:
if(!isWord) wordCount++;
isWord = true;
}
}
It works so far but e.g. when I have a " . " at the end it gives me 8 instead of 7 words. Here are some examples of sentences:
*„Schreiben Sie ein Praktikanten-Vermittlungs-Programm“ – words: 6
„Du magst ja recht haben – aber ich sehe das ganz anders.“ – words: 11
„Hallo Welt !!!!“ – words: 2
„Zwei Wörter !!!!“ – words: 2
„Eins,Zwei oder Drei“ – words: 4*
A sentence does not have to end with a " . ".
Any ideas how to solve that?
You forgot the break
statement in the first case
(after isWord = false
).
Since it's homework I won't solve it for you but point you in the right direction instead.
Take a look at the Character
class and the helper methods it defines. (Hint: they are all called isXyz()
)
Reference:
- Sun Apidocs: Character
- Sun Java Tutorial: Characters
For the heck of it: here's a oneliner method to count the words using Regex. Don't use this solution, come up with your own. This is probably not what your teachers want to see, anyway.
Method:
public static int countwords(final String phrase) {
return phrase.replaceAll("[^\\p{Alpha}]+", " ").trim().split(" ").length;
}
Test code:
System.out.println(countwords(
"Schreiben Sie ein Praktikanten-Vermittlungs-Programm"));
System.out.println(countwords(
"Du magst ja recht haben – aber ich sehe das ganz anders."));
System.out.println(countwords("Hallo Welt !!!!"));
System.out.println(countwords("Zwei Wörter !!!!"));
System.out.println(countwords("Eins,Zwei oder Drei"));
Output:
6
11
2
3
4
Explanation: To use a phrase coined by Henry Rollins: Let's milk it, shall we?
// replace any occurrences of non-alphabetic characters with a single space
// this pattern understands unicode, so e.g. German Umlauts count as alphabetic
phrase.replaceAll("[^\\p{Alpha}]+", " ")
// trim space off beginning and end
.trim()
// split the string, using the spaces as delimiter
.split(" ")
// the length of the resulting array is the number of words
.length;
Going off of Michael McGowan comment,
The logic seems backwards to me. Shouldn't the detection of a space or punctuation signify you found a word?
And is there any restraints on how your sentence is formed? If you had a sentence with "One,_Two,Three,Four,____Five"
, then the algorithm would need additional logic to handle consecutive spaces/punctuations.
You can use the class StringTokenizer from java.util and this would get really easyer. As parameters for the contruction use the string you have and all the delimiters you want.
StringTokenizer s = new StringTokenizer(yourString, ",. :;/");
int cantWords = s.countTokens();
Let's walk through a little example: "I am."
Iteration 1: current = 'I'; wordCount = 1; isWord = true;
Iteration 2: current = ' '; isWord = false; wordCount = 2; isWord = true;
Iteration 3: current = 'a'; isWord = true;
Iteration 4: current = 'm'; isWord = true;
Iteration 5: current = '.'; isWord = false; wordCount = 3; isWord = true;
Did you intentionally leave out the break in your switch? The logic you used seems a bit strange to me.
精彩评论