Python regex for matching Twitter usernames in the beginning of a tweet

2023-03-28 04:36 问答作者：

I have a tweet text like this:

"@user1 @user2 blablabla @user3"

I want to use a regex to filter the users in the beginning of a tweet. That would mean @user1 and @user2. There are not always the same number of users, there might be one, two, three...

I'm trying this with re.IGNORE开发者_运维知识库CASE:

re.compile(ur'^(@[a-z0-9_]*\s)*')

But doesn't match what I want, I've tried everything I've come up with, but failed. I'm not very familiar with Python regex, but this how I would do it with egrep:

echo "@user1 @user2 blablabla @user3" | egrep '^(@[[:alnum:]_]*[ ]*)*'

Thanks

Editing

The regex was right, I was just checking the solution the wrong way.

tweet = "@user1 @user2 blablabla @user3"
re.compile(ur'^(@[a-z0-9_]*\s)*').match(tweet).groups()

Instead of:

re.compile(ur'^(@[a-z0-9_]*\s)*').match(tweet).group(0)

Clearer version of the regex:

re.compile(ur'^(@\w+\s)+').match(tweet).group(0)

Without re, but with itertools:

>>> tw = "@user1 @user2 blablabla @user3"
>>> import itertools
>>> list(itertools.takewhile(lambda x: x.startswith('@'), tw.split()))
['@user1', '@user2']

Try this regular expression: ^(@\w+\s)+.

In @user1 @user2 blablabla @user3 it will match:

Python regex for matching Twitter usernames in the beginning of a tweet

Your egrep version applies a * to the space between words but your Python version doesn't. Also, \s matches all whitespace, not just spaces; and [a-zA-Z0-9_] (i.e. [a-z0-9_] with re.IGNORECASE, since that flag doesn't really affect anything else) is more easily spelled \w.

If regex isn't necessary:

>>> tweet = "@user1 @user2 blablabla @user3"
>>> s = tweet.split()
>>> s[:next(pos for pos, i in enumerate(s) if not i.startswith("@"))]
['@user1', '@user2']

Or simplier and more traditional one using a loop:

>>> tweet = "@user1 @user2 blablabla @user3"
>>> users = []
>>> for i in tweet.split():
...     if i.startswith("@"):
...         users.append(i)
...     else:
...         break
... 
>>> users
['@user1', '@user2']

This should work (if you want to remove them:

>>> t = "@user1 @user2 blablabla @user3"
>>> re.compile("^(?:@\w+\s+)*(.*)$").match(t).group(1)
'blablabla @user3'
>>>

or this (if you want to only get the users):

>>> re.compile("^((?:@\w+\s+)*)$").match(t).group(1).split()
['@user1', '@user2']
>>>

继续阅读：python regex

Python regex for matching Twitter usernames in the beginning of a tweet

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？