Python conditional list joins
I have a开发者_开发问答 list that looks like this:
[
'A',
'must',
'see',
'is',
'the',
'Willaurie',
',',
'which',
'sank',
'after',
'genoegfuuu',
'damaged',
'in',
'a',
'storm',
'in',
'1989',
'.'
]
As you can see, there is punctuation. I want to call .join
using a blank space except for the cases where the string is punctuation, then I don't want a separator.
What's the best way to do this?
I've been trying for a while and my solutions are getting way too complicated for what seems like a simple task.Thanks
The string
module has a list containing all punctuation characters.
import string
string = ''.join([('' if c in string.punctuation else ' ')+c for c in wordlist]).strip()
You have your answer already, but just would like to add, that not all punctuation should be stuck to a left-hand side. If you want to deal with more general sentences, you could have for example parentheses or apostrophes, and you don't want to end up with something like:
It' s a great movie( best I' ve seen)
I'd say it's pointless to create some nasty one-liner, just to do this in most pythonic way. If you don't need super fast solution, you could consider solving it step-by-step, for example:
import re
s = ['It', "'", 's', 'a', 'great', 'movie',
'(', 'best', 'I', "'", 've', 'seen', ')']
s = " ".join(s) # join normally
s = re.sub(" ([,.;\)])", lambda m: m.group(1), s) # stick to left
s = re.sub("([\(]) ", lambda m: m.group(1), s) # stick to right
s = re.sub(" ([']) ", lambda m: m.group(1), s) # join both sides
print s # It's a great movie (best I've seen)
It's pretty flexible and you can specify which punctuation is handled by each rule... It has 4 lines though, so you might dislike it. No matter which method you choose, there'll be probably some sentences that won't work correctly and need special case, so one-liner may be just a bad choice anyway.
EDIT: Actually, you can contract the above solution to one line, but as said before, I'm pretty sure there are more cases to consider:
print re.sub("( [,.;\)]|[\(] | ['] )", lambda m: m.group(1).strip(), " ".join(s))
>>> ''.join([('' if i in set(",.!?") else ' ') + i for i in words]).strip()
'A must see is the Willaurie, which sank after genoegfuuu damaged in a storm in 1989.'
Like so
re.sub(r'\s+(?=\W)', '', ' '.join(['A', 'must', 'see', 'is', 'the', 'Willaurie', ',', 'which', 'sank', 'after', 'genoegfuuu', 'damaged', 'in', 'a', 'storm', 'in', '1989', '.']))
How about using filter?
words = ['A', 'must', 'see', 'is', 'the', 'Willaurie', ',', 'which', 'sank', 'after', 'genoegfuuu', 'damaged', 'in', 'a', 'storm', 'in', '1989', '.']
' '.join(filter(lambda x: x not in string.punctuation, words))
精彩评论