How can I turn a list of words in a text file into a regex to filter out?

2023-01-14 07:22 问答作者：

I'm trying to filter out some text for certain keywords that are found in a text file. I was thinking about just parsing the file line by line, take each word and then merge them together with a pipe "|" then using that string inside re.sub.

Any better开发者_运维知识库 more efficient ideas are welcome.

Something like this without regexp?

import string
keyset = set(open('keywords.txt').read().splitlines())
for lineno,line in  enumerate(open('textfile.txt')):
    result = [kw
              for kw in keyset
              for w in line.split()
              if kw in w and w.strip(string.punctuation) == kw]
    if result:
        print "%5s (%s): %s" % (lineno,', '.join(result), line),

Something like the following?

import re

with file('keywords.txt', 'r') as k:
    kwords = sorted(k.read().strip().split(), lambda x: (len(x), x))
searchstring = r'\s?\b(' + '|'.join(kwords) + r')\b'
with file('textfile.txt', 'r') as t:
    text = t.read()
newtext, _ = re.subn(searchstring, '', text).lstrip()

继续阅读：python regex

How can I turn a list of words in a text file into a regex to filter out?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

王昌瑞《潜梦追凶》剧组庆生新锐演员未来可期？

Is it allowed to ask users to enter credit card details for own payment method?

Escaping "<" in Perl-generated XML

imessage会显示已读吗？

微信重新建群怎么建？