Python: use regular expression to remove the white space from all lines

2023-01-21 05:00 问答作者：

^(\s+)开发者_运维知识库 only removes the whitespace from the first line. How do I remove the front whitespace from all the lines?

Python's regex module does not default to multi-line ^ matching, so you need to specify that flag explicitly.

r = re.compile(r"^\s+", re.MULTILINE)
r.sub("", "a\n b\n c") # "a\nb\nc"

# or without compiling (only possible for Python 2.7+ because the flags option
# didn't exist in earlier versions of re.sub)

re.sub(r"^\s+", "", "a\n b\n c", flags = re.MULTILINE)

# but mind that \s includes newlines:
r.sub("", "a\n\n\n\n b\n c") # "a\nb\nc"

It's also possible to include the flag inline to the pattern:

re.sub(r"(?m)^\s+", "", "a\n b\n c")

An easier solution is to avoid regular expressions because the original problem is very simple:

content = 'a\n b\n\n c'
stripped_content = ''.join(line.lstrip(' \t') for line in content.splitlines(True))
# stripped_content == 'a\nb\n\nc'

you can try strip() if you want to remove front and back, or lstrip() if front

>>> s="  string with front spaces and back   "
>>> s.strip()
'string with front spaces and back'
>>> s.lstrip()
'string with front spaces and back   '

for line in open("file"):
    print line.lstrip()

If you really want to use regex

>>> import re
>>> re.sub("^\s+","",s) # remove the front
'string with front spaces and back   '
>>> re.sub("\s+\Z","",s)
'  string with front spaces and back'  #remove the back

@AndiDog acknowledges in his (currently accepted) answer that it munches consecutive newlines.

Here's how to fix that deficiency, which is caused by the fact that \n is BOTH whitespace and a line separator. What we need to do is make an re class that includes only whitespace characters other than newline.

We want whitespace and not newline, which can't be expressed directly in an re class. Let's rewrite that as not not (whitespace and not newline) i.e. not(not whitespace or not not newline (thanks, Augustus) i.e. not(not whitespace or newline) i.e. [^\S\n] in re notation.

So:

>>> re.sub(r"(?m)^[^\S\n]+", "", "  a\n\n   \n\n b\n c\nd  e")
'a\n\n\n\nb\nc\nd  e'

nowhite = ''.join(mytext.split())

NO whitespace will remain like you asked (everything is put as one word). More useful usualy is to join everything with ' ' or '\n' to keep words separately.

You'll have to use the re.MULTILINE option:

re.sub("(?m)^\s+", "", text)

The "(?m)" part enables multiline.

You don't actually need regular expressions for this most of the time. If you are only looking to remove common indentation across multiple lines, try the textwrap module:

>>> import textwrap
>>> messy_text = " grrr\n whitespace\n everywhere"
>>> print textwrap.dedent(messy_text)
grrr
whitespace
everywhere

Note that if the indentation is irregular, this will maintained:

>>> very_messy_text = " grrr\n \twhitespace\n everywhere"
>>> print textwrap.dedent(very_messy_text)
grrr
        whitespace
everywhere

继续阅读：python regex

Python: use regular expression to remove the white space from all lines

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

Easiest way to get words of one line from istream into a vector?

性激素六项检查的最佳时间是多久？多少钱？？

抽烟只抽炫赫门？

Infinite gtk warnings when I right click on the icon