Hadoop Streaming Job failed error in python(python中的Hadoop Streaming Job失败错误)
问题描述
来自 本指南,我已经成功运行了示例练习.但是在运行我的 mapreduce 作业时,我收到以下错误ERROR streaming.StreamJob:作业不成功!
2016 年 10 月 12 日 17:13:38 信息流.StreamJob:killJob...
流式传输作业失败!
来自日志文件的错误
From this guide, I have successfully run the sample exercise. But on running my mapreduce job, I am getting the following error
ERROR streaming.StreamJob: Job not Successful!
10/12/16 17:13:38 INFO streaming.StreamJob: killJob...
Streaming Job Failed!
Error from the log file
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 2
at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:311)
at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:545)
at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:132)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:36)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
映射器.py
import sys
i=0
for line in sys.stdin:
i+=1
count={}
for word in line.strip().split():
count[word]=count.get(word,0)+1
for word,weight in count.items():
print '%s %s:%s' % (word,str(i),str(weight))
reducer.py
Reducer.py
import sys
keymap={}
o_tweet="2323"
id_list=[]
for line in sys.stdin:
tweet,tw=line.strip().split()
#print tweet,o_tweet,tweet_id,id_list
tweet_id,w=tw.split(':')
w=int(w)
if tweet.__eq__(o_tweet):
for i,wt in id_list:
print '%s:%s %s' % (tweet_id,i,str(w+wt))
id_list.append((tweet_id,w))
else:
id_list=[(tweet_id,w)]
o_tweet=tweet
[edit] 运行作业的命令:
[edit] command to run the job:
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper /home/hadoop/mapper.py -file /home/hadoop/reducer.py -reducer /home/hadoop/reducer.py -input my-input/* -output my-output
输入是任意随机序列的句子.
Input is any random sequence of sentences.
谢谢,
推荐答案
你的 -mapper 和 -reducer 应该只是脚本名称.
Your -mapper and -reducer should just be the script name.
hadoop@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-0.20.0-streaming.jar -file /home/hadoop/mapper.py -mapper mapper.py -file /home/hadoop/reducer.py -reducer reducer.py -input my-input/* -output my-output
当您的脚本位于 hdfs 内另一个文件夹中的作业中时,该作业与执行为."的尝试任务相关.(仅供参考,如果您想要添加另一个文件,例如查找表,您可以在 Python 中打开它,就好像它与您的脚本在同一目录中一样,而您的脚本在 M/R 作业中)
When your scripts are in the job that is in another folder within hdfs which is relative to the attempt task executing as "." (FYI if you ever want to ad another -file such as a look up table you can open it in Python as if it was in the same dir as your scripts while your script is in M/R job)
还要确保你有 chmod a+x mapper.py 和 chmod a+x reducer.py
also make sure you have chmod a+x mapper.py and chmod a+x reducer.py
这篇关于python中的Hadoop Streaming Job失败错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:python中的Hadoop Streaming Job失败错误
基础教程推荐
- numpy float:比算术运算中内置的慢 10 倍? 2022-01-01
- 与常规 dict 相比,Python manager.dict() 非常慢 2022-01-01
- 将 x 轴刻度更改为自定义字符串 2022-01-01
- 用 Python 编写 Fortran 无格式文件 2022-01-01
- 由Python将MP3转换为MIDI(类型错误:无法加载插件:mtg-Melodia:Melodia) 2022-01-01
- 尝试制作WhatsApp机器人 2022-01-01
- Discord.py 缺少必需的参数 2022-01-01
- pyserial - 可以从线程 a 写入串行端口,是否阻塞从线程 b 读取? 2022-01-01
- 使用生成器和迭代器时 Python 多循环失败 2022-01-01
- 在 Celery 工作人员中捕获 Heroku SIGTERM 以优雅地关 2022-01-01
