Removing strings in brackets for unicode lines - python

2023-03-16 00:48 问答作者：

i've got some problems with my regex and removing my the strongs bounded by brackets.

here's my code:

import sys, re
import codecs

reload(sys)
sys.setdefaultencoding('utf-8')

reader = codecs.open("input",'r','utf-8')
p = re.compile('s/[\[\(].+?[\]\)]//g', re.DOTALL)
# i've also tried several regex but it didn't work
# p = re.compile('\{\{*?.*?\}\}', re.DOTALL)
# p = re.compile('\{\{*.*?\}\}', re.DOTA开发者_开发技巧LL)

for row in reader:
    if ("(" in row) and (")" not in row):
        continue
    if row.count("(") != row.count(")"):
        continue
    else:
        row2 = p.sub('', row)
        print row2

for the input textfiles it looks something like this:

가시 돋친(신랄한)평 spinosity
가장 완전한 (같은 종류의 것 중에서)   unabridged
(알코올이)표준강도(50%) 이하의 underproof
(암초 awash
치명적인(fatal) capital
열을) 전도하다    transmit

the required output should look like this:

가시 돋친평  spinosity
가장 완전한  unabridged
표준강도 이하의    underproof
치명적인    capital

Would this work for you?

# -*- coding: utf-8 -*-
import sys, re
import codecs

#reload(sys)
#sys.setdefaultencoding('utf-8')

#prepareing the examples to work on
writer = codecs.open("input.txt",'w','utf-8')
examples = [u'가시 돋친(신랄한)평 spinosity',
            u'가장 완전한 (같은 종류의 것 중에서)',
            u'알코올이)표준강도(50%) 이하의 underproof',
            u'(암초 awash',
            u'치명적인(fatal) capital']
for exampl in examples:
    writer.write(exampl+"\n")
writer.write(exampl)
writer.close()

reader = codecs.open("input.txt",'r','utf-8')

#order of patterns is important,
#if you remove brackets first, the other won't find anything
patterns_to_remove = [r"\(.{1,}\)",r"[\(\)]"]

#one pattern would work just fine, with the loop is a bit more clear
#pat = r"(\(.{1,}\))|([\(\)])"    
#for row in reader:
#    row = re.sub(pat,'',row)#,re.U)
#    print row

reader.seek(0)
for row in reader:
    for pat in patterns_to_remove:
        row = re.sub(pat,'',row)#,re.U)
    print row
reader.close()

继续阅读：brackets python regex string

Removing strings in brackets for unicode lines - python

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？