python string splitting
I have an input string like this: a1b2c30d40
and I want to tokenize the string to: a, 1, b, 2, c, 30, d, 40
.
开发者_JAVA百科I know I can read each character one by one and keep track of the previous character to determine if I should tokenize it or not (2 digits in a row means don't tokenize it) but is there a more pythonic way of doing this?
>>> re.split(r'(\d+)', 'a1b2c30d40')
['a', '1', 'b', '2', 'c', '30', 'd', '40', '']
On the pattern: as the comment says, \d
means "match one digit", +
is a modifier that means "match one or more", so \d+
means "match as much digits as possible". This is put into a group ()
, so the entire pattern in context of re.split
means "split this string using as much digits as possible as the separator, additionally capturing matched separators into the result". If you'd omit the group, you'd get ['a', 'b', 'c', 'd', '']
.
精彩评论