How to split this string with python?
I have strings that look like this example: "AAABBBCDEEEEBBBAA"
Any character is possible in the string.
I want to split it to a list like: ['AAA','BBB','C','D','EEEE','BBB','AA']
so every continuous str开发者_高级运维etch of the same characters goes to separate element of the split list.
I know that I can iterate over characters in the string, check every i and i-1 pair if they contain the same character, etc. but is there a more simple solution out there?
We could use Regex:
>>> import re
>>> r = re.compile(r'(.)\1*')
>>> [m.group() for m in r.finditer('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
Alternatively, we could use itertools.groupby
.
>>> import itertools
>>> [''.join(g) for k, g in itertools.groupby('AAABBBCDEEEEBBBAA')]
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
timeit
shows Regex is faster (for this particular string) (Python 2.6, Python 3.1). But Regex is after all specialized for string, and groupby
is a generic function, so this is not so unexpected.
>>> from itertools import groupby
>>> [''.join(g) for k, g in groupby('AAAABBBCCD')]
['AAAA', 'BBB', 'CC', 'D']
And by normal string manipulation
>>> a=[];S="";p=""
>>> s
'AAABBBCDEEEEBBBAA'
>>> for c in s:
... if c != p: a.append(S);S=""
... S=S+c
... p=c
...
>>> a.append(S)
>>> a
['', 'AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
>>> filter(None,a)
['AAA', 'BBB', 'C', 'D', 'EEEE', 'BBB', 'AA']
import itertools
s = "AAABBBCDEEEEBBBAA"
["".join(chars) for _, chars in itertools.groupby(s)]
Just another way of soloving your problem :
#!/usr/bin/python
string = 'AAABBBCDEEEEBBBAA'
memory = str()
List = list()
for index, element in enumerate(string):
if index > 0:
if string[index] == string[index - 1]:
memory += string[index]
else:
List.append(memory)
memory = element
else:
memory += element
print List
精彩评论