开发者

Retrieve all hashtags from a tweet in a PHP function

I want to retrieve all hashtags from a tweet using a PHP function.

I know someone asked a similar question here, but there is no hint how exactly to implement this in PHP. Since I'm not very familiar with regular expressions, don't know how to write a function that returns an array of all hashtags in a tweet.

So how do I do this, using the following regular expressio开发者_如何学Cn:

#\S*\w


I created my own solution. It does:

  • Finds all hashtags in a string
  • Removes duplicate ones
  • Sorts hashtags regarding to count of the existence in text
  • Supports unicode characters

    function getHashtags($string) {  
        $hashtags= FALSE;  
        preg_match_all("/(#\w+)/u", $string, $matches);  
        if ($matches) {
            $hashtagsArray = array_count_values($matches[0]);
            $hashtags = array_keys($hashtagsArray);
        }
        return $hashtags;
    }
    

Output is like this:

(
    [0] => #_ƒOllOw_
    [1] => #FF
    [2] => #neslitükendi
    [3] => #F_0_L_L_O_W_
    [4] => #takipedeğerdost
    [5] => #GönüldenTakipleşiyorum
)


$tweet = "this has a #hashtag a  #badhash-tag and a #goodhash_tag";

preg_match_all("/(#\w+)/", $tweet, $matches);

var_dump( $matches );

*Dashes are illegal chars for hashtags, underscores are allowed.


Don't forget about hashtags that contain unicode, numeric values and underscores:

$tweet = "Valid hashtags include: #hashtag #NYC2016 #NYC_2016 #gøypålandet!";

preg_match_all('/#([\p{Pc}\p{N}\p{L}\p{Mn}]+)/u', $tweet, $matches);

print_r( $matches );

\p{Pc} - to match underscore

\p{N} - numeric character in any script

\p{L} - letter from any language

\p{Mn} - any non marking space (accents, umlauts, etc)


Try this regular expression:

/#[^\s]*/i

Or use this if there are multiple hash tags joined together (eg. #foo#bar).

/#[^\s#]*/i

Running it PHP would look like:

preg_match_all('/#[^\s#]*/i', $tweet_string, $result);

The result is an array containing all the hashtags in the Tweet (saved as "$result" - the third argument).

Lastly, check out this site. I've found it really handy for testing regular expressions. http://regex.larsolavtorvik.com/

EDIT: I tried your regular expression and it worked great too!

EDIT 2: Added another regex to extract hash tags, even if they're consecutive.


Use the preg_match_all() function:

function get_hashtags($tweet)
{
    $matches = array();
    preg_match_all('/#\S*\w/i', $tweet, $matches);
    return $matches[0];
}
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜