Python regularise irregular time series with linear interpolation(Python用线性插值正则化不规则时间序列)
问题描述
我在 pandas 中有一个时间序列,如下所示:
I have a time series in pandas that looks like this:
Values
1992-08-27 07:46:48 28.0
1992-08-27 08:00:48 28.2
1992-08-27 08:33:48 28.4
1992-08-27 08:43:48 28.8
1992-08-27 08:48:48 29.0
1992-08-27 08:51:48 29.2
1992-08-27 08:53:48 29.6
1992-08-27 08:56:48 29.8
1992-08-27 09:03:48 30.0
我想将其重新采样为具有 15 分钟步长的常规时间序列,其中值是线性插值的.基本上我想得到:
I would like to resample it to a regular time series with 15 min times steps where the values are linearly interpolated. Basically I would like to get:
Values
1992-08-27 08:00:00 28.2
1992-08-27 08:15:00 28.3
1992-08-27 08:30:00 28.4
1992-08-27 08:45:00 28.8
1992-08-27 09:00:00 29.9
但是使用 Pandas 的重采样方法 (df.resample('15Min')) 我得到:
However using the resample method (df.resample('15Min')) from Pandas I get:
Values
1992-08-27 08:00:00 28.20
1992-08-27 08:15:00 NaN
1992-08-27 08:30:00 28.60
1992-08-27 08:45:00 29.40
1992-08-27 09:00:00 30.00
我尝试过使用不同的how"和fill_method"参数的重采样方法,但从未得到我想要的结果.我是不是用错了方法?
I have tried the resample method with different 'how' and 'fill_method' parameters but never got exactly the results I wanted. Am I using the wrong method?
我认为这是一个相当简单的查询,但我已经在网上搜索了一段时间并没有找到答案.
I figure this is a fairly simple query, but I have searched the web for a while and couldn't find an answer.
提前感谢我能得到的任何帮助.
Thanks in advance for any help I can get.
推荐答案
这需要一些工作,但试试这个.基本思想是找到最接近每个重采样点的两个时间戳并进行插值.np.searchsorted 用于查找最接近重采样点的日期.
It takes a bit of work, but try this out. Basic idea is find the closest two timestamps to each resample point and interpolate. np.searchsorted is used to find dates closest to the resample point.
# empty frame with desired index
rs = pd.DataFrame(index=df.resample('15min').iloc[1:].index)
# array of indexes corresponding with closest timestamp after resample
idx_after = np.searchsorted(df.index.values, rs.index.values)
# values and timestamp before/after resample
rs['after'] = df.loc[df.index[idx_after], 'Values'].values
rs['before'] = df.loc[df.index[idx_after - 1], 'Values'].values
rs['after_time'] = df.index[idx_after]
rs['before_time'] = df.index[idx_after - 1]
#calculate new weighted value
rs['span'] = (rs['after_time'] - rs['before_time'])
rs['after_weight'] = (rs['after_time'] - rs.index) / rs['span']
# I got errors here unless I turn the index to a series
rs['before_weight'] = (pd.Series(data=rs.index, index=rs.index) - rs['before_time']) / rs['span']
rs['Values'] = rs.eval('before * before_weight + after * after_weight')
毕竟,希望是正确的答案:
After all that, hopefully the right answer:
In [161]: rs['Values']
Out[161]:
1992-08-27 08:00:00 28.011429
1992-08-27 08:15:00 28.313939
1992-08-27 08:30:00 28.223030
1992-08-27 08:45:00 28.952000
1992-08-27 09:00:00 29.908571
Freq: 15T, Name: Values, dtype: float64
这篇关于Python用线性插值正则化不规则时间序列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Python用线性插值正则化不规则时间序列
基础教程推荐
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
