开发者

Regex Valid Twitter Mention

I'm trying to find a regex that matches if a Tweet it's a true mention. To be a mention, the string can't start with "@" and can't contain "RT" (case insensitive) and "@" must start the word.

In the examples I commented the desired output

Some examples:

function search($strings, $regexp) {
    $regexp;
    foreach ($strings as $string) 开发者_StackOverflow社区{
        echo "Sentence: \"$string\" <- " .
        (preg_match($regexp, $string) ? "MATCH" : "NO MATCH") . "\n";
    }
}

$strings = array(
"Hi @peter, I like your car ", // <- MATCH
"@peter I don't think so!", //<- NO MATCH: the string it's starting with @ it's a reply
"Helo!! :@ how are you!", // NO MATCH <- it's not a word, we need @(word) 
"Yes @peter i'll eat them this evening! RT @peter: hey @you, do you want your pancakes?", // <- NO MATCH "RT/rt" on the string , it's a RT
"Helo!! ineed@aser.com how are you!", //<- NO MATCH, it doesn't start with @
"@peter is the best friend you could imagine. RT @juliet: @you do you know if @peter it's awesome?" // <- NO MATCH starting with @ it's a reply and RT
);
echo "Example 1:\n";
search($strings,  "/(?:[[:space:]]|^)@/i");

Current output:

Example 1:
Sentence: "Hi @peter, I like your car " <- MATCH
Sentence: "@peter I don't think so!" <- MATCH
Sentence: "Helo!! :@ how are you!" <- NO MATCH
Sentence: "Yes @peter i'll eat them this evening! RT @peter: hey @you, do you want your pancakes?" <- MATCH
Sentence: "Helo!! ineed@aser.com how are you!" <- MATCH
Sentence: "@peter is the best friend you could imagine. RT @juliet: @you do you know if @peter it's awesome?" <- MATCH

EDIT:

I need it in regex beacause it can be used on MySQL and anothers languages too. Im am not looking for any username. I only want to know if the string it's a mention or not.


This regexp might work a bit better: /\B\@([\w\-]+)/gim

Here's a jsFiddle example of it in action: http://jsfiddle.net/2TQsx/96/


Here's a regex that should work:

/^(?!.*\bRT\b)(?:.+\s)?@\w+/i

Explanation:

/^             //start of the string
(?!.*\bRT\b)   //Verify that rt is not in the string.
(?:.*\s)?      //Find optional chars and whitespace the
                  //Note: (?: ) makes the group non-capturing.
@\w+           //Find @ followed by one or more word chars.
/i             //Make it case insensitive.


I have found that this is the best way to find mentions inside of a string in javascript. I don't know exactly how i would do the RT's but I think this might help with part of the problem.

var str = "@jpotts18 what is up man? Are you hanging out with @kyle_clegg";
var pattern = /@[A-Za-z0-9_-]*/g;
str.match(pattern);
["@jpotts18", "@kyle_clegg"]


I guess something like this will do it:

^(?!.*?RT\s).+\s@\w+

Roughly translated to:

At the beginning of string, look ahead to see that RT\s is not present, then find one or more of characters followed by a @ and at least one letter, digit or underscore.


Twitter has published the regex they use in their twitter-text library. They have other language versions posted as well on GitHub.


A simple but works correctly even if the scraping tool has appended some special characters sometimes: (?<![\w])@[\S]*\b. This worked for me

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜