How to calculate the columnwise minimum of a dask pivot table?
I would like to create a pivot table in dask and then calculate the column wise minimum.
import dask.dataframe as dd
from dask.distributed import Client
client = Client()
df = dd.read_csv("data.csv")
# In order to use pivot_table, the columns use as index and columns need to be categorical:
df = df.categorize(columns=['A', 'B'])
#df['A'] = df['A'].cat.开发者_开发知识库as_ordered()
#df['B'] = df['B'].cat.as_ordered()
pt = df.pivot_table(index='A', columns='B', values='C', aggfunc='mean')
pt.min().compute()
TypeError: Categorical is not ordered for operation min you can use .as_ordered() to change the Categorical to an ordered one
...
# Trying to uncategorize the index, takes forever
pt.index = list(pt.index)
pt.min().compute()
Is there a better way to archive this?
精彩评论