Hadoop Streaming Tâche a Échoué (sans Succès) en Python

Je suis en train de lancer un plan pour Réduire le travail sur Hadoop Streaming avec des scripts Python et d'obtenir les mêmes erreurs que Hadoop Streaming Tâche a échoué erreur en python mais ces solutions ne fonctionne pas pour moi.

Mes scripts fonctionnent très bien lorsque je lance le "chat sample.txt | ./p1mapper.py | tri | ./p1reducer.py"

Mais quand je lance le suivant:

./bin/hadoop jar contrib/streaming/hadoop-0.20.2-streaming.jar \
    -input "p1input/*" \
    -output p1output \
    -mapper "python p1mapper.py" \
    -reducer "python p1reducer.py" \
    -file /Users/Tish/Desktop/HW1/p1mapper.py \
    -file /Users/Tish/Desktop/HW1/p1reducer.py

(NB: Même si je supprime le "python", ou tapez le chemin d'accès complet de -mappeur et réducteur, le résultat est le même)

C'est le résultat que j'obtiens:

packageJobJar: [/Users/Tish/Desktop/HW1/p1mapper.py, /Users/Tish/Desktop/CS246/HW1/p1reducer.py, /Users/Tish/Documents/workspace/hadoop-0.20.2/tmp/hadoop-unjar4363616744311424878/] [] /var/folders/Mk/MkDxFxURFZmLg+gkCGdO9U+++TM/-Tmp-/streamjob3714058030803466665.jar tmpDir=null
11/01/18 03:02:52 INFO mapred.FileInputFormat: Total input paths to process : 1
11/01/18 03:02:52 INFO streaming.StreamJob: getLocalDirs(): [tmp/mapred/local]
11/01/18 03:02:52 INFO streaming.StreamJob: Running job: job_201101180237_0005
11/01/18 03:02:52 INFO streaming.StreamJob: To kill this job, run:
11/01/18 03:02:52 INFO streaming.StreamJob: /Users/Tish/Documents/workspace/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201101180237_0005
11/01/18 03:02:52 INFO streaming.StreamJob: Tracking URL: http://www.glassdoor.com:50030/jobdetails.jsp?jobid=job_201101180237_0005
11/01/18 03:02:53 INFO streaming.StreamJob:  map 0%  reduce 0%
11/01/18 03:03:05 INFO streaming.StreamJob:  map 100%  reduce 0%
11/01/18 03:03:44 INFO streaming.StreamJob:  map 50%  reduce 0%
11/01/18 03:03:47 INFO streaming.StreamJob:  map 100%  reduce 100%
11/01/18 03:03:47 INFO streaming.StreamJob: To kill this job, run:
11/01/18 03:03:47 INFO streaming.StreamJob: /Users/Tish/Documents/workspace/hadoop-0.20.2/bin/../bin/hadoop job  -Dmapred.job.tracker=localhost:54311 -kill job_201101180237_0005
11/01/18 03:03:47 INFO streaming.StreamJob: Tracking URL: http://www.glassdoor.com:50030/jobdetails.jsp?jobid=job_201101180237_0005
11/01/18 03:03:47 ERROR streaming.StreamJob: Job not Successful!
11/01/18 03:03:47 INFO streaming.StreamJob: killJob...
Streaming Job Failed!

Pour chaque Échec/Tué Tâche Tentative:

Map output lost, rescheduling: getMapOutput(attempt_201101181225_0001_m_000000_0,0) failed :
org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find taskTracker/jobcache/job_201101181225_0001/attempt_201101181225_0001_m_000000_0/output/file.out.index in any of the configured local directories
at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathToRead(LocalDirAllocator.java:389)
at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathToRead(LocalDirAllocator.java:138)
at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2887)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:502)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:363)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:324)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:533)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:207)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:403)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:522)

Voici mes scripts Python:
p1mapper.py

#!/usr/bin/env python
import sys
import re
SEQ_LEN = 4
eos = re.compile('(?<=[a-zA-Z])\.')   # period preceded by an alphabet
ignore = re.compile('[\W\d]')
for line in sys.stdin:
array = re.split(eos, line)
for sent in array:
sent = ignore.sub('', sent)
sent = sent.lower()
if len(sent) >= SEQ_LEN:
for i in range(len(sent)-SEQ_LEN + 1):
print '%s 1' % sent[i:i+SEQ_LEN]

p1reducer.py

#!/usr/bin/env python
from operator import itemgetter
import sys
word2count = {}
for line in sys.stdin:
word, count = line.split(' ', 1)
try:
count = int(count)
word2count[word] = word2count.get(word, 0) + count
except ValueError:    # count was not a number
pass
# sort
sorted_word2count = sorted(word2count.items(), key=itemgetter(1), reverse=True)
# write the top 3 sequences
for word, count in sorted_word2count[0:3]:
print '%s\t%s'% (word, count)

Serait vraiment reconnaissant de toute aide, merci!

Mise à JOUR:

hdfs-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>

mapred-site.xml:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>localhost:54311</value>
</property>
</configuration>
peut être un problème de configuration vous pouvez également poster votre hdfs-site.xml et mapred-site.xml configurations s'il vous plaît.
Collé ci-dessus. Merci beaucoup Joe!

OriginalL'auteur sirentian | 2011-01-18