Python: How to get count of values based on datetime(Python:如何根据日期时间获取值的计数)
问题描述
我编写了以下代码,它创建了两个数据帧 nq 和 cmnt.nq 包含UserId 和对应的徽章获得时间date.cmnt 包含 OwnerUserId 和用户发表评论的时间 CreationDate.
我想计算获得徽章 1 周前后所有天的评论数,以便我可以从中创建时间序列线图.
I have written the following code which creates two dataframes nq and cmnt.
nq contains the UserId and corresponding time of Badge Attainment date.
cmnt contains OwnerUserId and the time when the User made a comment CreationDate.
I want to get a count of the comments made for all days before and after 1 week of badge attainment so that I can create a time series line plot out of it.
以下代码执行相同的操作,但会产生 KeyError.请提供为所有用户执行此操作的代码.
The following code perform the same but produces a KeyError. Please provide a code that performs this operations for all users.
nq
UserId | date
1 2009-10-17 17:38:32.590
2 2009-10-19 00:37:23.067
3 2009-10-20 08:37:14.143
4 2009-10-21 18:07:51.247
5 2009-10-22 21:25:24.483
cmnt
OwnerUserId | CreationDate
1 2009-10-16 17:38:32.590
1 2009-10-18 17:38:32.590
2 2009-10-18 00:37:23.067
2 2009-10-17 00:37:23.067
2 2009-10-20 00:37:23.067
3 2009-10-19 08:37:14.143
4 2009-10-20 18:07:51.247
5 2009-10-21 21:25:24.483
代码
nq.date = pd.to_datetime(nq.date)
cmnt.CreationDate = pd.to_datetime(cmnt.CreationDate)
count= []
for j in range(len(nq)):
for i in range(-7,8):
check_date = nq.date.iloc[j] + timedelta(days=i)
count = cmnt.loc[(cmnt.OwnerUserId == nq.UserId.iloc[j]) & (cmnt.CreationDate == check_date)].shape[0]
nq.iloc[j].append({nq[i]:count})
预期输出
UserId | date |-7|-6|-5|-4|-3|-2|-1|0 |1 |2 |3 |4 |5 |6 |7
1 2009-10-17 17:38:32.590 |0 |0 |0 |0 |0 |0 |1 |0 |1 |0 |0 |0 |0 |0 |0
2 2009-10-19 00:37:23.067 |0 |0 |0 |0 |0 |1 |1 |0 |1 |0 |0 |0 |0 |0 |0
3 2009-10-20 08:37:14.143 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
4 2009-10-21 18:07:51.247 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
5 2009-10-22 21:25:24.483 |0 |0 |0 |0 |0 |0 |1 |0 |0 |0 |0 |0 |0 |0 |0
这里的列-1表示获得徽章前1天发表的评论,1表示获得徽章后一天发表的评论,依此类推.
Here column -1 means comment made 1 day before badge attainment and 1 means comment made one day after badge attainment and so on.
注意可以有一种完全替代的方法来做到这一点.我的主要目标是绘制一个时间序列线图,显示用户在获得徽章之前和之后发表的评论数.
Note There can be a completely alternately way to do this. My main objective is to draw a time series line plot which shows the number of comments made by the users before and after attainment of the badge.
推荐答案
可能你想要一个交叉合并,过滤,然后一个 crosstab:
Probably you want a cross-merge, filter and then a crosstab:
# merge the two dataframes
merged = (nq.merge(cmnt, left_on='UserId',
right_on='OwnerUserId',
how='left')
)
# extract the date difference between `date` and `CreationDate`
merged['date_diff'] = merged['date'].dt.normalize() - merged['CreationDate'].dt.normalize()
merged['date_diff'] = (merged['date_diff'] / pd.to_timedelta('1D')).astype(int)
# filter the comments within the range
merged = merged[merged['date_diff'].between(-7,7)]
# crosstab
pd.crosstab([merged['UserId'],merged['date']], merged['date_diff'])
输出:
date_diff -1 1 2
UserId date
1 2009-10-17 17:38:32.590 1 1 0
2 2009-10-19 00:37:23.067 1 1 1
3 2009-10-20 08:37:14.143 0 1 0
4 2009-10-21 18:07:51.247 0 1 0
5 2009-10-22 21:25:24.483 0 1 0
这篇关于Python:如何根据日期时间获取值的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Python:如何根据日期时间获取值的计数
基础教程推荐
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
