Email content needs to be filtered out with words in a dictionary
I was asked a question in an interview. The question was that the input string (which might come from email or large text file) needs to be filtered with list of bad words and replaced with something else.
for example, 开发者_如何学JAVAif the input string contains bad words which exist in a list of bad words, the bad word needs to be replaced with something else, something like empty string or wild card character.
The solution that I came up with was to put all the bad words in a hashmap and put input string in a StringBuffer and retrieve word by word delimited by space and check if word exists in hashmap, if it exists, replace the word with empty string. But the interviewer said that manipulating StringBuffer might be expensive, because stringBuffer maintains an array of characters. Replacing means it needs to copy to a new array.
Does anybody have a better algorithm instead of this solution?
Thanks.
you can first parse the String
into String[]
(using split()
) and, iterate over the words and check if a word is needed to be replaced by looking for it in your HashMap<String,String>
of blacklisted words. (if it is, you just replace the reference with the 'replace to' [the value of the String in the map], and do not create a new string). then, using StringBuilder
, rebuild your new string.
should look something like that:
public static String replaceString(Map<String,String> map,String input) {
String[] arr = input.split("\\s");
for (int i = 0;i<arr.length;i++) {
String val = map.get(arr[i]);
if (val != null) arr[i] = val;
}
StringBuilder sb = new StringBuilder();
for (String s : arr) {
if (s == null || s.length() == 0 ) continue;
sb.append(s).append(' ');
}
return sb.toString().trim();
}
Let the framework take care of the details and use regular expressions.
if you break the input into words and then filter them by the blacklist and then outputing them, there is no string concatenation needed. i think the interviewer got confused by your usage of replace and thought that you were changing the input buffer to remove the blacklisted words.
the method here has an iterator over the input returning words and a loop over those words and if a word is not in the blacklist it is outputted.
Unless the typical message is a profanity-laced tirade where every 2nd or 3rd word needed to be replaced, you're probably not going to be doing enough replace operations for this type of micro-optimizing to ever yield 1 second of CPU time savings over a year.
With that said, it would be more efficient to add the resulting string, word by word, to a new StringBuffer instead of replacing the contents of the original StringBuffer, because then you're never having to shift the contents that come after the word you're replacing.
精彩评论