开发者

python string splitting

I have an input string like this: a1b2c30d40 and I want to tokenize the string to: a, 1, b, 2, c, 30, d, 40.

开发者_JAVA百科I know I can read each character one by one and keep track of the previous character to determine if I should tokenize it or not (2 digits in a row means don't tokenize it) but is there a more pythonic way of doing this?


>>> re.split(r'(\d+)', 'a1b2c30d40')
['a', '1', 'b', '2', 'c', '30', 'd', '40', '']

On the pattern: as the comment says, \d means "match one digit", + is a modifier that means "match one or more", so \d+ means "match as much digits as possible". This is put into a group (), so the entire pattern in context of re.split means "split this string using as much digits as possible as the separator, additionally capturing matched separators into the result". If you'd omit the group, you'd get ['a', 'b', 'c', 'd', ''].

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜