Regex to remove blacklisted words from a sentence
How can I use a regext to filter out a list of blacklisted(Obscene) words, such that if a black listed words is like 'Bill Joseph'
Then 'I am Bill Josephine' is valid
but 'I am Bill Joseph.' is invalid
'I am Bill Joseph,' is invalid
'I am Bill Joseph ' invalid
'I am Bill Joseph<any non alphanumeric>' i开发者_如何学JAVAs invalid.
Similarly 'I am .Bill Joseph' is invalid
'I am <any non alphanumeric>Bill Joseph' is invalid.
Simple, and this works:
String badStrRegex = "\\WBill Joseph\\W?";
Pattern pattern = Pattern.compile(badStrRegex);
Matcher m = pattern.matcher(testStr); //testStr is your string under test
boolean isBad = m.find();
It works!! Tested against all your input.
Use the negation of the alphanumeric character class:
"[^A-Za-z0-9]Bill Joseph[^A-Za-z0-9]"
Using "\W" in place of "[^A-Za-z0-9]" would work in most cases except when there is an underscore before/after the name. So "Bill Joseph_" still would be seen as valid.
Make sure the word is surrounded by a word boundary ".*\\b" + badWord + "\\b.*"
精彩评论