开发者

How can I compress repetitive characters to a single character using RE in Python?

I want to be able to replace any consecutive occurrences of punctuation characters in a string with a single occurrence. For example:

The first thing that came to mind was to:

for char in string.punctuation:
  text = re.sub( "\\" + char + "+",  char,  text )

However, since this is going to run in a repetitive process, I was wondering if there is a way to achieve this in a single RE, in order to make it run faster. What do you think?


You could try:

text = re.sub(r"([" + re.escape(string.punctuation) + r"])\1+", r"\1", text)

This uses re.escape() to ensure that the punctuation characters are properly escaped as necessary. The \1 backreferences refer to the part within the parentheses (), which is the first punctuation character matched. So this replaces instances of two or more repeated punctuation characters with the same single character.


re.sub(r'([!?.])\1+', r'\1', text)

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜