开发者

how to build a regular expression (regex) for slangs and emoticons

i need to build a regex to match slangs (i.e. lol, lmao, imo, etc..) and emoticons (i.e. :), :P, ;), etc...).

i followed the example at http://www.coderanch.com/t/497238/java/java/Regular-Expression-Detecting-Emoticons. however, this method/approach is failing for me.

for example, let's say i need to match the slang "od". i create a Pattern as follows. Pattern pattern = Pattern.compile(Pattern.quote("od"));

let's say i need to match the slang "od" in the following test sentence, "some methods are bad." empirically, there is one match on the word "methods" in the string, which is not what i want.

i did read some of the javadoc and some of the tutorial regarding java and regex, but i still can't figure this out.

by the way, i am using Java 6 (though i've looked and reference the java 5 api doc).

if regex is not the best way to go, i am opened to other solutions as well. thanks in advance for any help/pointers. the following code gets me 3 matches and is based on the link above.

String regex = "od";
Pattern pattern = Pattern.compile(Pattern.quote(regex));
String str = "some methods are bad od od more text";
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
    System.out.println(matcher.group());
}

the following code returns no matches and is based on the responses so far.

String regex = "\bod\b";
Pattern pattern = Pattern.compile(regex);
//Pattern pattern = Pattern.compile(Pattern.quote(regex)); //this fails
String str = "some methods are bad od od more text";
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
    System.out.println(matcher.group());
}

after the two helpful responses below, i will post the correct/d开发者_C百科esired code snippet here.

String regex = "(\\bod\\b)|(\\blmao\\b)";
Pattern pattern = Pattern.compile(regex);
String str = "some methods are bad od od more text lmao more text";
Matcher matcher = pattern.matcher(str);
while(matcher.find()) {
    System.out.println(matcher.group());
}

this code is correct or as desired because empirically, it gives me 3 matches (2 od and 1 lmao). sorry, i wish i am stronger with regex using java (and just regex in general). thanks for your help.


[:;]-?[DP()]

handles the combinations of ":" or ":" plus either "-" and "D" or "P" or ")" or "("
eg. :P :-( ;D etc...

just add more combinations...

have fun..


You can use word boundaries (\b) in order to match a word that's just the slang you want.

So for example, the pattern "\bod\b" will match "od", but won't match "method".


Do you need to use a regex? I would do

String str = "some methods are bad od od more text lmao more text";
String[] words = str.split(" ");
for (String s : words) {
  if (s.equals("od") || s.equals("lamo"))
    System.out.println(s);
}


Its better to use more metacharacters and class of characters in the Regular Expression. Oracle Regular Expressions Documentation You can head here if you want to know about metacharacters and their different combinations.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜