Grouping a series in Python
Title edit: capitalization fixed and 'for python' added.
Is there a better or more standard way to do what I'm describing? I want input like this:
[1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]
to be transformed to this:
[0, 1, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 0, 0, 2, 0]
or, even better, something like this (describing similar output differently, but now not limited to integers):
labels: [1, 2, 3, 1, 2]
positions(where 1 identified the first occupiable position, as per my matplotlib plot): [2, 7, 12.5, 17, 21]
The input data is categorical data that classified a plot - in the picture below, grouped plots share a categorical feature which I'd like to label only once for the group. I'll be using开发者_开发问答 2 axes for two different variables, but I think that's besides the point for now.
Note: This image does not reflect either set of sample data - it's just to get across the idea of grouping together categories. Group a should be labeled at x=5, since there's a blank space between the first two and second to vertical data groups, and 0 is the line on the right side.
Here's what I've got:
data = [1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 3, 2, 2, 1, 1, 1, 1]
last = None
runs = []
labels = []
run = 1
for x in data:
if x in (last, 0):
run += 1
else:
runs.append(run)
run = 1
labels.append(x)
last = x
runs.append(run)
runs.pop(0)
labels.append(x)
tick_positions = [0]
last_run = 1
for run in runs:
tick_positions.append(run/2.0+last_run/2.0+tick_positions[-1])
last_run = run
tick_positions.pop(0)
print tick_positions
To get the labels you can use itertools groupby
:
>>> import itertools
>>> numbers = [1, 1, 1, 0, 2, 2, 0, 2, 2, 0, 0, 3, 3, 0, 1, 1, 1, 1, 1, 2, 2, 2]
>>> list(k for k, g in itertools.groupby(numbers))
[1, 0, 2, 0, 2, 0, 3, 0, 1, 2]
And to remove the zeros you can use a comprehension:
>>> list(k for k, g in itertools.groupby(x for x in numbers if x != 0))
[1, 2, 3, 1, 2]
If you want to get the positions too, then you'll have to iterate through the list yourself as you are already doing. groupby
doesn't keep track of that for you.
精彩评论