pandas: how to run a pivot with a multi-index?(pandas:如何使用多索引运行数据透视?)
问题描述
我想在 pandas DataFrame 上运行一个支点,索引是两列,而不是一列.例如,一个字段用于年份,一个用于月份,一个item"字段显示item 1"和item 2",以及一个带有数值的value"字段.我希望索引为年 + 月.
I would like to run a pivot on a pandas DataFrame, with the index being two columns, not one. For example, one field for the year, one for the month, an 'item' field which shows 'item 1' and 'item 2' and a 'value' field with numerical values. I want the index to be year + month.
我设法使它工作的唯一方法是将两个字段合并为一个,然后再次将它们分开.有没有更好的办法?
The only way I managed to get this to work was to combine the two fields into one, then separate them again. is there a better way?
下面复制的最小代码.非常感谢!
Minimal code copied below. Thanks a lot!
PS 是的,我知道关键字pivot"和multi-index"还有其他问题,但我不明白他们是否/如何帮助我解决这个问题.
PS Yes, I am aware there are other questions with the keywords 'pivot' and 'multi-index', but I did not understand if/how they can help me with this question.
import pandas as pd
import numpy as np
df= pd.DataFrame()
month = np.arange(1, 13)
values1 = np.random.randint(0, 100, 12)
values2 = np.random.randint(200, 300, 12)
df['month'] = np.hstack((month, month))
df['year'] = 2004
df['value'] = np.hstack((values1, values2))
df['item'] = np.hstack((np.repeat('item 1', 12), np.repeat('item 2', 12)))
# This doesn't work:
# ValueError: Wrong number of items passed 24, placement implies 2
# mypiv = df.pivot(['year', 'month'], 'item', 'value')
# This doesn't work, either:
# df.set_index(['year', 'month'], inplace=True)
# ValueError: cannot label index with a null key
# mypiv = df.pivot(columns='item', values='value')
# This below works but is not ideal:
# I have to first concatenate then separate the fields I need
df['new field'] = df['year'] * 100 + df['month']
mypiv = df.pivot('new field', 'item', 'value').reset_index()
mypiv['year'] = mypiv['new field'].apply( lambda x: int(x) / 100)
mypiv['month'] = mypiv['new field'] % 100
推荐答案
你可以分组然后unstack.
You can group and then unstack.
>>> df.groupby(['year', 'month', 'item'])['value'].sum().unstack('item')
item item 1 item 2
year month
2004 1 33 250
2 44 224
3 41 268
4 29 232
5 57 252
6 61 255
7 28 254
8 15 229
9 29 258
10 49 207
11 36 254
12 23 209
或者使用pivot_table:
>>> df.pivot_table(
values='value',
index=['year', 'month'],
columns='item',
aggfunc=np.sum)
item item 1 item 2
year month
2004 1 33 250
2 44 224
3 41 268
4 29 232
5 57 252
6 61 255
7 28 254
8 15 229
9 29 258
10 49 207
11 36 254
12 23 209
这篇关于pandas:如何使用多索引运行数据透视?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:pandas:如何使用多索引运行数据透视?
基础教程推荐
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
