开发者

Regex to remove blacklisted words from a sentence

How can I use a regext to filter out a list of blacklisted(Obscene) words, such that if a black listed words is like 'Bill Joseph'

 Then 'I am Bill Josephine' is valid
    but 'I am Bill Joseph.' is invalid
        'I am Bill Joseph,' is invalid
        'I am Bill Joseph ' invalid
        'I am Bill Joseph<any non alphanumeric>' i开发者_如何学JAVAs invalid.

    Similarly 'I am .Bill Joseph' is invalid
              'I am <any non alphanumeric>Bill Joseph' is invalid.


Simple, and this works:

String badStrRegex = "\\WBill Joseph\\W?";
Pattern pattern = Pattern.compile(badStrRegex);
Matcher m = pattern.matcher(testStr);  //testStr is your string under test
boolean isBad = m.find();

It works!! Tested against all your input.


Use the negation of the alphanumeric character class:

"[^A-Za-z0-9]Bill Joseph[^A-Za-z0-9]"

Using "\W" in place of "[^A-Za-z0-9]" would work in most cases except when there is an underscore before/after the name. So "Bill Joseph_" still would be seen as valid.


Make sure the word is surrounded by a word boundary ".*\\b" + badWord + "\\b.*"

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜