Pandas - group by column and transform the data to numpy array(Pandas - 按列分组并将数据转换为 numpy 数组)
问题描述
Having the following data frame, group A have 4 samples, B 3 samples and C 1 sample:
group data_1 data_2
0 A 1 4
1 A 2 5
2 A 3 6
3 A 4 7
4 B 1 4
5 B 2 5
6 B 3 6
7 C 1 4
I would like to transform the data into numpy array, where each row is a group with all its samples and zero padding for groups that have fewer samples.
Resulting in an array like so:
[
[[1,4],[2,5],[3,6],[4,7]], # this is A group 4 samples
[[1,4],[2,5],[3,6],[0,0]], # this is B group 3 samples
[[1,4],[0,0],[0,0],[0,0]], # this is C group 1 sample
]
First is necessary add missing values - first solution with unstack and stack, counter Series is created by cumcount.
Second solution use reindex by MultiIndex.
Last use lambda function with groupby, convert to numpy array by values and last to lists:
g = df.groupby('group').cumcount()
L = (df.set_index(['group',g])
.unstack(fill_value=0)
.stack().groupby(level=0)
.apply(lambda x: x.values.tolist())
.tolist())
print (L)
[[[1, 4], [2, 5], [3, 6], [4, 7]],
[[1, 4], [2, 5], [3, 6], [0, 0]],
[[1, 4], [0, 0], [0, 0], [0, 0]]]
Another solution:
g = df.groupby('group').cumcount()
mux = pd.MultiIndex.from_product([df['group'].unique(), g.unique()])
L = (df.set_index(['group',g])
.reindex(mux, fill_value=0)
.groupby(level=0)['data_1','data_2']
.apply(lambda x: x.values.tolist())
.tolist()
)
这篇关于Pandas - 按列分组并将数据转换为 numpy 数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Pandas - 按列分组并将数据转换为 numpy 数组
基础教程推荐
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
