原文地址:http://blog.csdn.net/mach_learn/article/details/41824737?utm_source=tuicool&utm_medium=referral
1、本地运行出错及解决办法
当运行如下命令时:
- ./bin/spark-submit \
- --class org.apache.spark.examples.mllib.JavaALS \
- --master local[*] \
- /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
- /user/data/netflix_rating 10 10 /user/data/result
会出现如下错误:
- Exception in thread "main" java.lang.RuntimeException: java.io.IOException: No FileSystem for scheme: hdfs
- at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:657)
- at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:389)
- at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:362)
- at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
- at org.apache.spark.SparkContext$$anonfun$22.apply(SparkContext.scala:546)
- at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
- at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:145)
- at scala.Option.map(Option.scala:145)
- at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:145)
- at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:168)
- at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
- at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
- at scala.Option.getOrElse(Option.scala:120)
- at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
- at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
- at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
- at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
- at scala.Option.getOrElse(Option.scala:120)
- at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
- at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
- at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)
- at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)
- at scala.Option.getOrElse(Option.scala:120)
- at org.apache.spark.rdd.RDD.partitions(RDD.scala:202)
- at org.apache.spark.mllib.recommendation.ALS.run(ALS.scala:167)
- at org.apache.spark.mllib.recommendation.ALS$.train(ALS.scala:599)
- at org.apache.spark.mllib.recommendation.ALS.train(ALS.scala)
- at org.apache.spark.examples.mllib.JavaALS.main(JavaALS.java:80)
- at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
- at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
- at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
- at java.lang.reflect.Method.invoke(Method.java:597)
- at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
- at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
- at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
- Caused by: java.io.IOException: No FileSystem for scheme: hdfs
- at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2385)
- at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2392)
- at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:89)
- at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2431)
- at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2413)
- at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:368)
- at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:167)
- at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:653)
- ... 34 more
出现此错误的原因为spark执行过程中缺少hadoop-hdfs的jar包,使用spark-submit中的--jar或者--driver-class-path参数可以解决此问题。当使用hadoop-hdfs时路径指的就是hdfs路径。
正确的执行方式如下:
- ./bin/spark-submit \
- --class org.apache.spark.examples.mllib.JavaALS \
- --driver-class-path /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \
- --master local[*] \
- /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
- /user/data/netflix_rating 10 10 /user/data/result
- 或者
- ./bin/spark-submit \
- --class org.apache.spark.examples.mllib.JavaALS \
- --jars /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \
- --master local[*] \
- /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
- /user/data/netflix_rating 10 10 /user/data/result
2、spark在yarn上运行错误及解决办法
当运行如下命令时:
- ./bin/spark-submit \
- --class org.apache.spark.examples.mllib.JavaALS \
- --master yarn-cluster \
- /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
- /user/data/netflix_rating 10 10 /user/data/result
会出现如下错误:
- Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/yarn/client/api/impl/YarnClientImpl
- at java.lang.ClassLoader.defineClass1(Native Method)
- at java.lang.ClassLoader.defineClassCond(ClassLoader.java:631)
- at java.lang.ClassLoader.defineClass(ClassLoader.java:615)
- at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
- at java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
- at java.net.URLClassLoader.access$000(URLClassLoader.java:58)
- at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
- at java.security.AccessController.doPrivileged(Native Method)
- at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
- at java.lang.Class.forName0(Native Method)
- at java.lang.Class.forName(Class.java:247)
- at org.apache.spark.util.Utils$$anonfun$classIsLoadable$1.apply(Utils.scala:143)
- at org.apache.spark.util.Utils$$anonfun$classIsLoadable$1.apply(Utils.scala:143)
- at scala.util.Try$.apply(Try.scala:161)
- at org.apache.spark.util.Utils$.classIsLoadable(Utils.scala:143)
- at org.apache.spark.deploy.SparkSubmit$.createLaunchEnv(SparkSubmit.scala:158)
- at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:54)
- at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
- Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.yarn.client.api.impl.YarnClientImpl
- at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
- at java.security.AccessController.doPrivileged(Native Method)
- at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
- at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
- at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
- ... 21 more
产生此错误的原因是缺少hadoop-yarn目录下的jar包,解决此问题的方法只能使用--driver-class-path参数,原因是执行spark on yarn时,需要提前将hadoop-yarn目录下的jar包导入。
正确的执行方式如下:
- ./bin/spark-submit \
- --class org.apache.spark.examples.mllib.JavaALS \
- --master yarn-cluster \
- --driver-class-path $(echo /opt/cloudera/parcels/CDH/lib/hadoop-yarn/*.jar |sed 's/ /:/g'):/opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-hdfs/hadoop-hdfs-2.3.0-cdh5.1.2.jar \
- /opt/cloudera/parcels/CDH-5.1.2-1.cdh5.1.2.p0.3/lib/hadoop-yarn/lib/spark-examples_2.10-1.0.0-cdh5.1.2.jar \
- /user/data/netflix_rating 10 10 /user/data/result
执行结果集如下图所示
/user/data/result/productFeatures/part-00000数据格式为:
/user/data/result/userFeatures/part-00000数据格式为:
所有评论(0)