java.io.IOException: Pas de système de fichiers pour le système hdfs

Je suis à l'aide de Cloudera de Démarrage de la VM CDH5.3.0 (en termes de colis bundle) et de la bougie 1.2.0 avec $SPARK_HOME=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark et de la soumission de l'Étincelle de l'application à l'aide de la commande

./bin/spark-submit --class <Spark_App_Main_Class_Name> --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/<Spark_App_Target_Jar_Name>.jar

Spark_App_Main_Class_Name.scala

import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
import org.apache.spark.mllib.util.MLUtils


object Spark_App_Main_Class_Name {

    def main(args: Array[String]) {
        val hConf = new SparkConf()
            .set("fs.hdfs.impl", classOf[org.apache.hadoop.hdfs.DistributedFileSystem].getName)
            .set("fs.file.impl", classOf[org.apache.hadoop.fs.LocalFileSystem].getName)
        val sc = new SparkContext(hConf)
        val data = MLUtils.loadLibSVMFile(sc, "hdfs://localhost.localdomain:8020/analytics/data/mllib/sample_libsvm_data.txt")
        ...
    }

}

Mais je reçois le ClassNotFoundException pour org.apache.hadoop.hdfs.DistributedFileSystem tandis que l'étincelle-la soumission de l'application en mode client

[cloudera@localhost bin]$ ./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar
15/11/30 09:46:34 INFO SparkContext: Spark configuration:
spark.app.name=Spark_App_Main_Class_Name
spark.driver.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
spark.eventLog.dir=hdfs://localhost.localdomain:8020/user/spark/applicationHistory
spark.eventLog.enabled=true
spark.executor.extraLibraryPath=/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/hadoop/lib/native
spark.executor.memory=4G
spark.jars=file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar
spark.logConf=true
spark.master=spark://localhost.localdomain:7077
spark.yarn.historyServer.address=http://localhost.localdomain:18088
15/11/30 09:46:34 WARN Utils: Your hostname, localhost.localdomain resolves to a loopback address: 127.0.0.1; using 10.113.234.150 instead (on interface eth12)
15/11/30 09:46:34 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
15/11/30 09:46:34 INFO SecurityManager: Changing view acls to: cloudera
15/11/30 09:46:34 INFO SecurityManager: Changing modify acls to: cloudera
15/11/30 09:46:34 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(cloudera); users with modify permissions: Set(cloudera)
15/11/30 09:46:35 INFO Slf4jLogger: Slf4jLogger started
15/11/30 09:46:35 INFO Remoting: Starting remoting
15/11/30 09:46:35 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:59473]
15/11/30 09:46:35 INFO Remoting: Remoting now listens on addresses: [akka.tcp://[email protected]:59473]
15/11/30 09:46:35 INFO Utils: Successfully started service 'sparkDriver' on port 59473.
15/11/30 09:46:36 INFO SparkEnv: Registering MapOutputTracker
15/11/30 09:46:36 INFO SparkEnv: Registering BlockManagerMaster
15/11/30 09:46:36 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20151130094636-8c3d
15/11/30 09:46:36 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/11/30 09:46:38 INFO HttpFileServer: HTTP File server directory is /tmp/spark-7d1f2861-a568-4919-8f7e-9a9fe6aab2b4
15/11/30 09:46:38 INFO HttpServer: Starting HTTP Server
15/11/30 09:46:38 INFO Utils: Successfully started service 'HTTP file server' on port 50003.
15/11/30 09:46:38 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/11/30 09:46:38 INFO SparkUI: Started SparkUI at http://10.113.234.150:4040
15/11/30 09:46:39 INFO SparkContext: Added JAR file:/opt/cloudera/parcels/CDH-5.3.0-1.cdh5.3.0.p0.30/lib/spark/bin/../apps/Spark_App_Target_Jar_Name.jar at http://10.113.234.150:50003/jars/Spark_App_Target_Jar_Name.jar with timestamp 1448894799228
15/11/30 09:46:39 INFO AppClient$ClientActor: Connecting to master spark://localhost.localdomain:7077...
15/11/30 09:46:40 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20151130094640-0000
15/11/30 09:46:41 INFO NettyBlockTransferService: Server created on 56458
15/11/30 09:46:41 INFO BlockManagerMaster: Trying to register BlockManager
15/11/30 09:46:41 INFO BlockManagerMasterActor: Registering block manager 10.113.234.150:56458 with 267.3 MB RAM, BlockManagerId(<driver>, 10.113.234.150, 56458)
15/11/30 09:46:41 INFO BlockManagerMaster: Registered BlockManager
Exception in thread "main" java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2047)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2578)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90)
at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:352)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:92)
at Spark_App_Main_Class_Name$.main(Spark_App_Main_Class_Name.scala:22)
at Spark_App_Main_Class_Name.main(Spark_App_Main_Class_Name.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.ClassNotFoundException: Class org.apache.hadoop.hdfs.DistributedFileSystem not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1953)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2045)
... 16 more

Il semble que l'Étincelle application n'est pas en mesure de cartographier la SF parce que d'abord j'ai été l'obtention de l'erreur:

Exception in thread "main" java.io.IOException: No FileSystem for scheme: hdfs
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:367)
at org.apache.spark.util.FileLogger.<init>(FileLogger.scala:90)
at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:63)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:352)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:92)
at LogisticRegressionwithBFGS$.main(LogisticRegressionwithBFGS.scala:21)
at LogisticRegressionwithBFGS.main(LogisticRegressionwithBFGS.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

et j'ai suivi hadoop Pas de système de fichiers pour le système de fichier d'ajouter "fs.hdfs.impl" et "fs.fichier.impl" l'Étincelle aux paramètres de configuration

InformationsquelleAutor somnathchakrabarti | 2015-12-02

8

Vous avez besoin d'avoir hadoop-sf-2.x pots (maven lien) dans votre classpath.
Lors de la soumission de votre demande de mentionner thhe supplémentaires jar emplacement à l'aide de --pot option d'étincelle soumettre.

Sur une autre note, vous devez être idéalement le déplacement à CDH5.5 qui ont spark1.5.
- ajouté hadoop-hdfs les bocaux avec de l' --pots option tandis que l'étincelle-la soumission mais de donner à java.lang.ClassNotFoundException : <Spark_App_Main_Class_Name>
- somnath, pouvez-vous fournir toutes les bougies de soumettre commande
- ./spark-soumettre --classe Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --déployer en mode client --exécuteur-mémoire 4G --pots /opt/cloudera/colis/CDH/lib/hadoop-hdfs/*.jar ../apps/Spark_App_Target_Jar_Name.jar résolu ClassNotFoundException, mais ne pas voir mis en application en vertu de l'Étincelle Maître WebUI
InformationsquelleAutor Atul Soman
0

J'ai obtenu par le biais de ce problème après quelques recherche détaillée et n'différentes méthodes de test. Fondamentalement, le problème semble être dû à l'indisponibilité de l'hadoop-hdfs pots mais lors de la soumission d'application spark, la charge des pots n'a pas pu être trouvé, même après l'utilisation de maven-assembly-plugin ou maven-jar-plugin/maven-dependency-plugin

Dans le maven-jar-plugin/maven-dependency-plugin combinaison, la classe principale pot et la variable dépendante pots sont en cours de création, mais encore de fournir la charge des bocaux avec --jar option conduit à la même erreur comme suit
```
./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G --jars ../apps/Spark_App_Target_Jar_Name-dep.jar ../apps/Spark_App_Target_Jar_Name.jar
```
À l'aide de maven-shade-plugin comme suggéré dans hadoop-pas de système de fichiers-pour-système de fichier par "krookedking" semble frappé le problème au bon moment, car la création d'un seul fichier jar comprenant classe principale et toutes les classes dépendantes éliminé le classpath questions.

Mon travail final étincelle soumettre commande se dresse comme suit:
```
./spark-submit --class Spark_App_Main_Class_Name --master spark://localhost.localdomain:7077 --deploy-mode client --executor-memory 4G ../apps/Spark_App_Target_Jar_Name.jar
```
La maven-shade-plugin dans mon projet pom.xml est comme suit:
```
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-shade-plugin</artifactId>
<version>2.4.2</version>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>shade</goal>
</goals>
<configuration>
<filters>
<filter>
<artifact>*:*</artifact>
<excludes>
<exclude>META-INF/*.SF</exclude>
<exclude>META-INF/*.DSA</exclude>
<exclude>META-INF/*.RSA</exclude>
</excludes>
</filter>
</filters>
<transformers>
<transformer implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
</transformers>
</configuration>
</execution>
</executions>
</plugin>
```
Remarque: L'exclut dans le filtre va permettre de se débarrasser de
```
java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
```
InformationsquelleAutor somnathchakrabarti
-2

J'ai été confrontée au même problème lors de l'exécution de l'Étincelle code de mon IDE et l'accès à distance HDFS.

J'ai donc mis la configuration suivante, et il s'est résolu.
```
JavaSparkContext jsc=new JavaSparkContext(conf);
Configuration hadoopConfig = jsc.hadoopConfiguration();
hadoopConfig.set("fs.hdfs.impl",org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
hadoopConfig.set("fs.file.impl",org.apache.hadoop.fs.LocalFileSystem.class.getName());
```
- Veuillez ajouter un peu de contexte pour votre réponse. Expliquer comment il résout le problème. Vous risquez d'obtenir votre post en bas de voter et/ou la fermeture de
- et au moins fixer l'indentation
InformationsquelleAutor Ketan Keshri

Vous devez vous connecter pour publier un commentaire.