Remove duplicates in a list of lists based on the third item in each sublist(根据每个子列表中的第三项删除列表列表中的重复项)
问题描述
我有一个看起来像这样的列表:
I have a list of lists that looks like:
c = [['470', '4189.0', 'asdfgw', 'fds'],
['470', '4189.0', 'qwer', 'fds'],
['470', '4189.0', 'qwer', 'dsfs fdv']
...]
c 有大约 30,000 个内部列表.我想做的是根据每个内部列表中的第 4 项消除重复项.所以上面的列表看起来像:
c has about 30,000 interior lists. What I'd like to do is eliminate duplicates based on the 4th item on each interior list. So the list of lists above would look like:
c = [['470', '4189.0', 'asdfgw', 'fds'],['470', '4189.0', 'qwer', 'dsfs fdv'] ...]
这是我目前所拥有的:
d = [] #list that will contain condensed c
d.append(c[0]) #append first element, so I can compare lists
for bact in c: #c is my list of lists with 30,000 interior list
for items in d:
if bact[3] != items[3]:
d.append(bact)
我认为这应该可行,但它只是运行和运行.我让它运行了 30 分钟,然后杀死了它.我不认为程序应该花这么长时间,所以我猜我的逻辑有问题.
I think this should work, but it just runs and runs. I let it run for 30 minutes, then killed it. I don't think the program should take so long, so I'm guessing there is something wrong with my logic.
我觉得创建一个全新的列表非常愚蠢.任何帮助将不胜感激,请在我学习时随时挑剔.如果我的词汇不正确,请更正我的词汇.
I have a feeling that creating a whole new list of lists is pretty stupid. Any help would be much appreciated, and please feel free to nitpick as I am learning. Also please correct my vocabulary if it is incorrect.
推荐答案
我会这样做:
seen = set()
cond = [x for x in c if x[3] not in seen and not seen.add(x[3])]
解释:
seen 是一个跟踪每个子列表中已经遇到的第四个元素的集合.cond 是精简列表.如果 x[3](其中 x 是 c 中的子列表)不在 seen 中,则 x 将被添加到 cond 并且 x[3] 将被添加到 seen.
seen is a set which keeps track of already encountered fourth elements of each sublist.
cond is the condensed list. In case x[3] (where x is a sublist in c) is not in seen, x will be added to cond and x[3] will be added to seen.
seen.add(x[3]) 将返回 None,因此 not seen.add(x[3]) 将始终为 True,但只有当 x[3] not in seen 为 True 时才会评估该部分,因为 Python 使用短路评估.如果第二个条件得到评估,它将始终返回 True 并具有将 x[3] 添加到 seen 的副作用.这是正在发生的另一个示例(print 返回 None 并具有打印某些内容的副作用"):
seen.add(x[3]) will return None, so not seen.add(x[3]) will always be True, but that part will only be evaluated if x[3] not in seen is True since Python uses short circuit evaluation. If the second condition gets evaluated, it will always return True and have the side effect of adding x[3] to seen. Here's another example of what's happening (print returns None and has the "side-effect" of printing something):
>>> False and not print('hi')
False
>>> True and not print('hi')
hi
True
这篇关于根据每个子列表中的第三项删除列表列表中的重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:根据每个子列表中的第三项删除列表列表中的重
基础教程推荐
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
