Remove characters from beginning and end or only end of line
I want to remove some symbols from a string using a regular expression, for example:
==
(that occur both at the beginning and at the end of a line),
*
(at the beginning of a line ONLY).
def some_func():
clean = re.sub(r'= {2,}', '', clean) #Removes 2 or more occurrences of = at the beg and at the end of开发者_JAVA技巧 a line.
clean = re.sub(r'^\* {1,}', '', clean) #Removes 1 or more occurrences of * at the beginning of a line.
What's wrong with my code? It seems like expressions are wrong. How do I remove a character/symbol if it's at the beginning or at the end of the line (with one or more occurrences)?
If you only want to remove characters from the beginning and the end, you could use the string.strip()
method. This would give some code like this:
>>> s1 = '== foo bar =='
>>> s1.strip('=')
' foo bar '
>>> s2 = '* foo bar'
>>> s2.lstrip('*')
' foo bar'
The strip
method removes the characters given in the argument from the beginning and the end of the string, ltrip
removes them from only the beginning, and rstrip
removes them only from the end.
If you really want to use a regular expression, they would look something like this:
clean = re.sub(r'(^={2,})|(={2,}$)', '', clean)
clean = re.sub(r'^\*+', '', clean)
But IMHO, using strip
/lstrip
/rstrip
would be the most appropriate for what you want to do.
Edit: On Nick's suggestion, here is a solution that would do all this in one line:
clean = clean.lstrip('*').strip('= ')
(A common mistake is to think that these methods remove characters in the order they're given in the argument, in fact, the argument is just a sequence of characters to remove, whatever their order is, that's why the .strip('= ')
would remove every '=' and ' ' from the beginning and the end, and not just the string '= '.)
You have extra spaces in your regexs. Even a space counts as a character.
r'^(?:\*|==)|==$'
First of all you should pay attention to the spaces before "{" ... those are meaningful so the quantifier in your example applies to the space.
To remove "=" (two or more) only at begin or end also you need a different regexp... for example
clean = re.sub(r'^(==+)?(.*?)(==+)?$', r'\2', s)
If you don't put either "^" or "$" the expression can match anywhere (i.e. even in the middle of the string).
And not substituting but keeping ? :
tu = ('======constellation==' , '==constant=====' ,
'=flower===' , '===bingo=' ,
'***seashore***' , '*winter*' ,
'====***conditions=**' , '=***trees====***' ,
'***=information***=' , '*=informative***==' )
import re
RE = '((===*)|\**)?(([^=]|=(?!=+\Z))+)'
pat = re.compile(RE)
for ch in tu:
print ch,' ',pat.match(ch).group(3)
Result:
======constellation== constellation
==constant===== constant
=flower=== =flower
===bingo= bingo=
***seashore*** seashore***
*winter* winter*
====***conditions=** ***conditions=**
=***trees====*** =***trees====***
***=information***= =information***=
*=informative***== =informative***
Do you want in fact
====***conditions=** to give conditions=** ?
***====hundred====*** to give hundred====*** ?
for the beginning ?**
I think that the following code will do the job:
tu = ('======constellation==' , '==constant=====' ,
'=flower===' , '===bingo=' ,
'***seashore***' , '*winter*' ,
'====***conditions=**' , '=***trees====***' ,
'***=information***=' , '*=informative***==' )
import re,codecs
with codecs.open('testu.txt', encoding='utf-8', mode='w') as f:
pat = re.compile('(?:==+|\*+)?(.*?)(?:==+)?\Z')
xam = max(map(len,tu)) + 3
res = '\n'.join(ch.ljust(xam) + pat.match(ch).group(1)
for ch in tu)
f.write(res)
print res
Where was my brain when I wrote the RE in my earlier post ??! O!O Non greedy quantifier .*? before ==+\Z is the real solution.
精彩评论