How to count cumulatively with conditions on a groupby?
Say I have a data-frame, filled as below, with the column 'Key'
having one of five possible va开发者_C百科lues A, B, C, D, X
. I would like to add a new column 'Res'
that counts the number of these letters cumulatively and resets each time it hits and X
.
For example:
Key Res
0 D 1
1 X 0
2 B 1
3 C 2
4 D 3
5 X 0
6 A 1
7 C 2
8 X 0
9 X 0
May anyone assist in how I can achieve this?
A possible solution:
a = df.Key.ne('X')
df['new'] = ((a.cumsum()-a.cumsum().where(~a).ffill().fillna(0)).astype(int))
Another possible solution, which is more basic than the previous one, but much faster (several orders of magnitude):
s = np.zeros(len(df), dtype=int)
for i in range(len(df)):
if df.Key[i] != 'X':
s[i] = s[i-1] + 1
df['new'] = s
Output:
Key Res new
0 D 1 1
1 X 0 0
2 B 1 1
3 C 2 2
4 D 3 3
5 X 0 0
6 A 1 1
7 C 2 2
8 X 0 0
9 X 0 0
Example
df = pd.DataFrame(list('DXBCDXACXX'), columns=['Key'])
df
Key
0 D
1 X
2 B
3 C
4 D
5 X
6 A
7 C
8 X
9 X
Code
df1 = pd.concat([df.iloc[[0]], df])
grouper = df1['Key'].eq('X').cumsum()
df1.assign(Res=df1.groupby(grouper).cumcount()).iloc[1:]
result:
Key Res
0 D 1
1 X 0
2 B 1
3 C 2
4 D 3
5 X 0
6 A 1
7 C 2
8 X 0
9 X 0
精彩评论