开发者

Python conditional list joins

I have a开发者_开发问答 list that looks like this:

[
  'A',
  'must',
  'see',
  'is',
  'the',
  'Willaurie',
  ',',
  'which',
  'sank',
  'after', 
  'genoegfuuu',
  'damaged',
  'in',
  'a',
  'storm',
  'in',
  '1989',
  '.'
]

As you can see, there is punctuation. I want to call .join using a blank space except for the cases where the string is punctuation, then I don't want a separator.

What's the best way to do this?

I've been trying for a while and my solutions are getting way too complicated for what seems like a simple task.

Thanks


The string module has a list containing all punctuation characters.

import string
string = ''.join([('' if c in string.punctuation else ' ')+c for c in wordlist]).strip()


You have your answer already, but just would like to add, that not all punctuation should be stuck to a left-hand side. If you want to deal with more general sentences, you could have for example parentheses or apostrophes, and you don't want to end up with something like:

It' s a great movie( best I' ve seen)

I'd say it's pointless to create some nasty one-liner, just to do this in most pythonic way. If you don't need super fast solution, you could consider solving it step-by-step, for example:

import re
s = ['It', "'", 's', 'a', 'great', 'movie', 
     '(', 'best', 'I', "'", 've', 'seen', ')']

s = " ".join(s) # join normally
s = re.sub(" ([,.;\)])", lambda m: m.group(1), s) # stick to left
s = re.sub("([\(]) ", lambda m: m.group(1), s)    # stick to right
s = re.sub(" ([']) ", lambda m: m.group(1), s)    # join both sides

print s # It's a great movie (best I've seen)

It's pretty flexible and you can specify which punctuation is handled by each rule... It has 4 lines though, so you might dislike it. No matter which method you choose, there'll be probably some sentences that won't work correctly and need special case, so one-liner may be just a bad choice anyway.

EDIT: Actually, you can contract the above solution to one line, but as said before, I'm pretty sure there are more cases to consider:

print re.sub("( [,.;\)]|[\(] | ['] )", lambda m: m.group(1).strip(), " ".join(s))


>>> ''.join([('' if i in set(",.!?") else ' ') + i for i in words]).strip()
'A must see is the Willaurie, which sank after genoegfuuu damaged in a storm in 1989.'


Like so

re.sub(r'\s+(?=\W)', '', ' '.join(['A', 'must', 'see', 'is', 'the', 'Willaurie', ',', 'which', 'sank', 'after', 'genoegfuuu', 'damaged', 'in', 'a', 'storm', 'in', '1989', '.']))


How about using filter?

words = ['A', 'must', 'see', 'is', 'the', 'Willaurie', ',', 'which', 'sank', 'after', 'genoegfuuu', 'damaged', 'in', 'a', 'storm', 'in', '1989', '.']
' '.join(filter(lambda x: x not in string.punctuation, words))
0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜