How can I compress repetitive characters to a single character using RE in Python?
I want to be able to replace any consecutive occurrences of punctuation characters in a string with a single occurrence. For example:
- "I went to the park...." => "I went to the park."
- "Are you serious??!!???!" 开发者_StackOverflow社区=> "Are you serious?!?!"
The first thing that came to mind was to:
for char in string.punctuation:
text = re.sub( "\\" + char + "+", char, text )
However, since this is going to run in a repetitive process, I was wondering if there is a way to achieve this in a single RE, in order to make it run faster. What do you think?
You could try:
text = re.sub(r"([" + re.escape(string.punctuation) + r"])\1+", r"\1", text)
This uses re.escape()
to ensure that the punctuation characters are properly escaped as necessary. The \1
backreferences refer to the part within the parentheses ()
, which is the first punctuation character matched. So this replaces instances of two or more repeated punctuation characters with the same single character.
re.sub(r'([!?.])\1+', r'\1', text)
精彩评论