Python: Writing to a single file with queue while using multiprocessing Pool(Python:在使用多处理池时使用队列写入单个文件)
问题描述
我有数十万个文本文件,我想以各种方式进行解析.我想将输出保存到单个文件而不会出现同步问题.我一直在使用多处理池来执行此操作以节省时间,但我不知道如何组合池和队列.
I have hundreds of thousands of text files that I want to parse in various ways. I want to save the output to a single file without synchronization problems. I have been using multiprocessing pool to do this to save time, but I can't figure out how to combine Pool and Queue.
以下代码将保存文件名以及文件中连续x"的最大数量.但是,我希望所有进程都将结果保存到同一个文件中,而不是像我的示例中那样保存到不同的文件中.对此的任何帮助将不胜感激.
The following code will save the infile name as well as the maximum number of consecutive "x"s in the file. However, I want all processes to save results to the same file, and not to different files as in my example. Any help on this would be greatly appreciated.
import multiprocessing
with open('infilenamess.txt') as f:
filenames = f.read().splitlines()
def mp_worker(filename):
with open(filename, 'r') as f:
text=f.read()
m=re.findall("x+", text)
count=len(max(m, key=len))
outfile=open(filename+'_results.txt', 'a')
outfile.write(str(filename)+'|'+str(count)+'
')
outfile.close()
def mp_handler():
p = multiprocessing.Pool(32)
p.map(mp_worker, filenames)
if __name__ == '__main__':
mp_handler()
推荐答案
多处理池为您实现了一个队列.只需使用将工作人员返回值返回给调用者的池方法.imap 运行良好:
Multiprocessing pools implement a queue for you. Just use a pool method that returns the worker return value to the caller. imap works well:
import multiprocessing
import re
def mp_worker(filename):
with open(filename) as f:
text = f.read()
m = re.findall("x+", text)
count = len(max(m, key=len))
return filename, count
def mp_handler():
p = multiprocessing.Pool(32)
with open('infilenamess.txt') as f:
filenames = [line for line in (l.strip() for l in f) if line]
with open('results.txt', 'w') as f:
for result in p.imap(mp_worker, filenames):
# (filename, count) tuples from worker
f.write('%s: %d
' % result)
if __name__=='__main__':
mp_handler()
这篇关于Python:在使用多处理池时使用队列写入单个文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Python:在使用多处理池时使用队列写入单个文件
基础教程推荐
- 尝试制作WhatsApp机器人 2022-01-01
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
