L'exécution d'une tâche à l'aide d'hadoop streaming et mrjob: PipeMapRed.waitOutputThreads(): sous-processus a échoué avec le code 1

Hé, je suis assez nouveau dans le monde du Big Data.
Je suis tombé sur ce tutoriel sur
http://musicmachinery.com/2011/09/04/how-to-process-a-million-songs-in-20-minutes/

Il décrit dans le détail de la façon d'exécuter le travail de MapReduce à l'aide de mrjob à la fois localement et sur Elastic Map Reduce.

Eh bien, je suis en train de l'exécuter sur mon propre Hadoop cluser. J'ai couru le travail à l'aide de la commande suivante.

python density.py tiny.dat -r hadoop --hadoop-bin /usr/bin/hadoop > outputmusic

Et c'est ce que j'obtiens:

HADOOP: Running job: job_1369345811890_0245
HADOOP: Job job_1369345811890_0245 running in uber mode : false
HADOOP:  map 0% reduce 0%
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_0, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Container killed by the ApplicationMaster.
HADOOP:
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_1, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000000_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP: Task Id : attempt_1369345811890_0245_m_000001_2, Status : FAILED
HADOOP: Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:320)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:533)
HADOOP:         at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:130)
HADOOP:         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:61)
HADOOP:         at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
HADOOP:         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:428)
HADOOP:         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
HADOOP:         at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:157)
HADOOP:         at java.security.AccessController.doPrivileged(Native Method)
HADOOP:         at javax.security.auth.Subject.doAs(Subject.java:415)
HADOOP:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
HADOOP:         at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:152)
HADOOP:
HADOOP:  map 100% reduce 0%
HADOOP: Job job_1369345811890_0245 failed with state FAILED due to: Task failed task_1369345811890_0245_m_000001
HADOOP: Job failed as tasks failed. failedMaps:1 failedReduces:0
HADOOP:
HADOOP: Counters: 6
HADOOP:         Job Counters
HADOOP:                 Failed map tasks=7
HADOOP:                 Launched map tasks=8
HADOOP:                 Other local map tasks=6
HADOOP:                 Data-local map tasks=2
HADOOP:                 Total time spent by all maps in occupied slots (ms)=32379
HADOOP:                 Total time spent by all reduces in occupied slots (ms)=0
HADOOP: Job not Successful!
HADOOP: Streaming Command Failed!
STDOUT: packageJobJar: [] [/usr/lib/hadoop-mapreduce/hadoop-streaming-2.0.0-cdh4.2.1.jar] /tmp/streamjob3272348678857116023.jar tmpDir=null
Traceback (most recent call last):
File "density.py", line 34, in <module>
MRDensity.run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 344, in run
mr_job.run_job()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/job.py", line 381, in run_job
runner.run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/runner.py", line 316, in run
self._run()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 175, in _run
self._run_job_in_hadoop()
File "/usr/lib/python2.6/site-packages/mrjob-0.2.4-py2.6.egg/mrjob/hadoop.py", line 325, in _run_job_in_hadoop
raise CalledProcessError(step_proc.returncode, streaming_args)
subprocess.CalledProcessError: Command '['/usr/bin/hadoop', 'jar', '/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.2.1.jar', '-cmdenv', 'PYTHONPATH=mrjob.tar.gz', '-input', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/input', '-output', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/output', '-cacheFile', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/density.py#density.py', '-cacheArchive', 'hdfs:///user/E824259/tmp/mrjob/density.E824259.20130611.053850.343441/files/mrjob.tar.gz#mrjob.tar.gz', '-mapper', 'python density.py --step-num=0 --mapper --protocol json --output-protocol json --input-protocol raw_value', '-jobconf', 'mapred.reduce.tasks=0']' returned non-zero exit status 1

Remarque: Comme l'ont proposé certains autres forums que j'ai compris

#! /usr/bin/python

au début de mes fichiers python density.py et track.py. Il semble avoir travaillé pour la plupart des gens, mais j'ai toujours continuer à être au-dessus de la exceprions.

Edit: j'ai inclus la définition de l'une des fonctions utilisées dans l'original density.py ce qui a été défini dans un autre fichier track.py dans density.py lui-même. Le travail a couru avec succès. Mais il serait vraiment utile si quelqu'un sait pourquoi ce qui se passe.

OriginalL'auteur Kiran Karanth | 2013-06-11