Regex pattern for tweets

2022-12-07 22:16 问答作者：

I am building a tweet classification model, and I am having trouble finding a regex pattern that fits what I am looking for. what I want the regex pattern to pick up:

Any hashtags used in the tweet but without the hash mark (example - #omg to just omg)
Any mentions used in the tweet but without the @ symbol (example - @username to just username)
I don't want any numbers or any words containing numbers returned ( this is the most difficult task for me)
Other than that, I just want all words returned

Thank you in advance if you can help

Currently I am using this pattern:** r"(?u)\b\w\w+\b"** but it is failing t开发者_运维知识库o remove numbers.

import re

tweet = "#omg @username I can't believe it's not butter! #butter #123 786 #one1"

# Define the regular expression
regex = r"(?u)\b(?<=\#)\w+(?=\b)|(?<=@)\w+(?=\b)"

# Extract the hashtags and mentions
hashtags_and_mentions = re.findall(regex, tweet)

# Print the results
print(hashtags_and_mentions)  # Output: ['omg', 'username', 'butter']

This regex should work.

(#|@)?(?![^ ]*\d[^ ]*)([^ ]+)

Explanation:

(#|@)?: A 'hash' or 'at' character. Match 0 or 1 times.

(?!): Negative lookahead. Check ahead to see if the pattern in the brackets matches. If so, negate the match.

[^ ]*\d[^ ]*: any number of not space characters, followed by a digit, followed by any number of space characters. This is nested in the negative lookahead, so if a number is found in the username or hashtag, the match is negated.

([^ ]+): One or more not space characters. The negative lookahead is a 0-length match, so if it passes, fetch the rest of the username/hashtag (Grouped with brackets so you can replace with $2).

继续阅读：machine-learning python qregularexpression regex text-classification

Regex pattern for tweets

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？