PANDA中的Groupby算法和插值算法

编程基础网 Python问题

2022-01-01

Groupby and interpolate in Pandas(PANDA中的Groupby算法和插值算法)

本文介绍了PANDA中的Groupby算法和插值算法的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据，其中包含周号、帐户ID和几个使用情况列。我希望a)按帐户ID分组，b)将每周数据重新采样为每日数据，以及c)均匀内插每日数据(将每周数据除以7)，然后将其全部组合在一起。我已经记下了大部分，但是 pandas groupby让我有点迷惑。它也非常慢，这让我认为这可能不是最佳解决方案。

数据如下：

    Account Id  year week         views stats foo_col 
31133   213     2017-03-05          4.0     2.0     11.0
10085   456     2017-03-12          1.0     6.0     3.0
49551   789     2017-03-26          1.0     6.0     27.0

以下是我的代码：

def interpolator(mini_df):
    mini_df = mini_df[cols_to_interpolate].set_index('year week')
    return mini_df.resample('D').ffill().interpolate() / 7

example = list(grp)[0][1]
interpolator(example) # This works perfectly

df.groupby('Account Id').agg(interpolator)                # doesn't work
df.groupby('Account Id').transform(interpolator)          # doesn't work

for name,group in grp:
    group = group[cols_to_interpolate].set_index('year week')
    group = group.resample('D').ffill().interpolate() / 7 # doesn't work

for acc_id in df['Account Id'].unique():
    mask = df.loc[df['Account Id'] == acc_id]
    print(df[mask])                                     # doesn't work

推荐答案

我希望您的函数应该与groupby对象链接在一起，如下所示：

df = (df.set_index('year week')
        .groupby('Account Id')[cols_to_interpolate]
        .resample('D')
        .ffill()
        .interpolate() / 7)

注释中的解决方案不同-interpolate适用于每个组：

df.groupby('Account Id').apply(interpolator)

这篇关于PANDA中的Groupby算法和插值算法的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持编程学习网！

编程基础网

本文标题为：PANDA中的Groupby算法和插值算法

上一篇：将值替换为平均值

下一篇： GROUP BY+新列+基于条件的前一行抓取值

基础教程推荐

学习HTML

学习jQuery

学习Laravel

学习CSS3

学习Vue.js

学习Bootstrap5

学习ThinkPHP

学习AJAX