'Conditional' groupby in pandas DataFrame
I am calculating market beta using daily data with pandas.DataFrame. That is, I want to calculate variances of market return and covariances between market return and individual stock return using 252 days window with 200 minimum observation conditions with groupby. Beta is Var(market_return)/Cov(market_return, stock_return). First, I used unconditional groupby to obtain the beta value, which means that I calculate the variances and covariances for every day of my data sample. However, then, I realize that calculating all betas consumes too much time and is wasteful. This is because only end-of-the-month data will be used. For example, even if betas are calculated on 1st Jan, 2nd Jan, ..., and 31st Jan, only the beta of 31st Jan will be used. Therefore, I want to know if there is any way to run my groupby code conditionally.
For example, my output is as follows using 252 days window with 200 minimum observation groupby.
stock_key | date | var(market_return) | covar(market_return, stock_return) |
---|---|---|---|
A | 2012-01-26 | 9.4212 | -4.23452 |
A | 2012-01-27 | 9.3982 | -4.18421 |
A | 2012-01-28 | 9.1632 | -4.33552 |
A | 2012-01-29 | 9.0456 | -4.55831 |
A | 2012-01-30 | 9.2231 | -4.92373 |
A | 2012-01-31 | 9.0687 | -4开发者_运维百科.04133 |
... | |||
A | 2012-02-27 | 8.9345 | -4.72344 |
A | 2012-02-28 | 9.0010 | -4.82349 |
... | |||
B | 2012-01-26 | 4.8456 | -1.42325 |
B | 2012-01-27 | 4.8004 | -1.18421 |
B | 2012-01-28 | 4.0983 | -1.02842 |
B | 2012-01-29 | 4.9465 | -1.13834 |
B | 2012-01-30 | 4.7354 | -1.63450 |
B | 2012-01-31 | 4.1945 | -1.18234 |
I want to know is there any way to get result as follows.
stock_key | date | var(market_return) | covar(market_return, stock_return) |
---|---|---|---|
A | 2012-01-31 | 9.0687 | -4.04133 |
A | 2012-02-28 | 9.0010 | -4.82349 |
B | 2012-01-31 | 4.1945 | -1.18234 |
Thankyou for reading my question.
Without using groupby we can check if the date in the row is the last day of the month.
df['date']=pd.to_datetime(df['date']) #string to datetime
#Is the date in the row the last day of that month?
dfx=df[df['date'] - pd.offsets.Day() + pd.offsets.MonthEnd(1)==df['date']]
Output:
stock_key date var(market_return) covar(market_return, stock_return)
5 A 2012-01-31 9.0687 -4.04133
15 B 2012-01-31 4.1945 -1.18234
Note: 2012-02's last day is 29.
精彩评论