开发者

Regex: Matching words with special characters

I'm trying to find a regular expression that matches a word in a string (the exact word). The problem is when this word has special characters like '#' or anything else. The special characters could be a开发者_如何学Cny UTF-8 char, like ("áéíóúñ#@"), and it have to ignore punctuations marks.

I put some examples of what i'm looking for:

Searching:#myword

 Sentence: "I like the elephants when they say #myword" <- MATCH
 Sentence: "I like the elephants when they say #mywords" <- NO MATCH
 Sentence: "I like the elephants when they say myword" <-NO MATCH
 Sentence: "I don't like #mywords. its silly" <- NO MATCH
 Sentence: "I like #myword!! It's awesome" <- MATCH
 Sentence: "I like #myword It's awesome" <- MATCH

PHP Example code:

 $regexp= "#myword";
    if (preg_match("/(\w$regexp)/", "I like #myword!! It's awesome")) {
        echo "YES YES YES";
    } else {
        echo "NO NO NO ";
    }

Thank you!

Update: If I look for "myword" the word has to begin by "w" and not another char.

Sentence: "I like myword!! It's awesome" <- MATCH
Sentence: "I like #myword It's awesome" <-NO MATCH


The solution below is produced when thinking about characters and boundaries separately. There could also be a viable approach to use word boundaries directly.

Code:

function search($strings,$search) {
        $regexp = "/(?:[[:space:]]|^)".$search."(?:[^\w]|$)/i";
        foreach ($strings as $string) {
                echo "Sentence: \"$string\" <- " . 
                     (preg_match($regexp,$string) ? "MATCH" : "NO MATCH") ."\n";
        }
}

$strings = array(
"I like the elephants when they say #myword",
"I like the elephants when they say #mywords",
"I like the elephants when they say myword",
"I don't like #mywords. its silly",
"I like #myword!! It's awesome",
"I like #mywOrd It's awesome",
);
echo "Example 1:\n";
search($strings,"#myword");

$strings = array(
"I like myword!! It's awesome",
"I like #myword It's awesome",
);
echo "Example 2:\n";
search($strings,"myword");

Output:

Example 1:
Sentence: "I like the elephants when they say #myword" <- MATCH
Sentence: "I like the elephants when they say #mywords" <- NO MATCH
Sentence: "I like the elephants when they say myword" <- NO MATCH
Sentence: "I don't like #mywords. its silly" <- NO MATCH
Sentence: "I like #myword!! It's awesome" <- MATCH
Sentence: "I like #mywOrd It's awesome" <- MATCH
Example 2:
Sentence: "I like myword!! It's awesome" <- MATCH
Sentence: "I like #myword It's awesome" <- NO MATCH


You should search myword with wordboundary like this /\bmyword\b/.
# itself is also a wordboundary so /\b#myword\b/ dosen't work.
one idea was to escape unicode character with \X but this will create other problems.

/ #myword\b/


This should do the trick (replace "myword" with whatever you want to find):

^.*#myword[^\w].*$

If the match is a success then your word was found - otherwise it wasn't.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜