Launching Spark in YARN

Hi,
I was trying to launch simple application in cluster mode using YARN. I followed following steps:

  1. export YARN_CONF_DIR=/etc/hadoop/conf/
  2. export HADOOP_CONF_DIR=/etc/hadoop/conf/
  3. spark-submit --master yarn --class org.apache.spark.examples.SparkPi /usr/hdp/current/spark-client/lib/spark-examples-*.jar 10

After following above steps, I am getting these errors:

18/12/27 12:09:06 INFO TaskSetManager: Finished task 9.0 in stage 0.0 (TID 9) in 18 ms on ip-172-31-20-247.ec2.internal (executor 2) (10/10)
18/12/27 12:09:06 INFO YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/12/27 12:09:06 INFO DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:36) finished in 5.782 s
18/12/27 12:09:06 INFO DAGScheduler: Job 0 finished: reduce at SparkPi.scala:36, took 5.836619 s
Pi is roughly 3.1412191412191413
18/12/27 12:09:06 INFO SparkUI: Stopped Spark web UI at http://ip-172-31-38-146.ec2.internal:4041
18/12/27 12:09:06 ERROR Client: Failed to contact YARN for application application_1545335729280_1471.
java.io.InterruptedIOException: Call interrupted
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1548)
at org.apache.hadoop.ipc.Client.call(Client.java:1498)
at org.apache.hadoop.ipc.Client.call(Client.java:1398)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
at com.sun.proxy.$Proxy15.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getApplicationReport(ApplicationClientProtocolPBClientImpl.java:191)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:290)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:202)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:184)
at com.sun.proxy.$Proxy16.getApplicationReport(Unknown Source)
at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getApplicationReport(YarnClientImpl.java:454)
at org.apache.spark.deploy.yarn.Client.getApplicationReport(Client.scala:300)
at org.apache.spark.deploy.yarn.Client.monitorApplication(Client.scala:1124)
at org.apache.spark.scheduler.cluster.YarnClientSchedulerBackend$MonitorThread.run(YarnClientSchedulerBackend.scala:109)
18/12/27 12:09:06 ERROR YarnClientSchedulerBackend: Yarn application has already exited with state FAILED!

What could have happened?

This seems to have been resolved by itself. When I did it again it worked!!
But still would like to know why the error in my first try. Thanks!

Hi @alokdeosingh1995,

Just curious did you check the error log in the job browser?

How do we check the error log in the job browser?

This errors out on me too giving me the error about cleanup staging dir. What should I do?

19/09/20 18:03:07 WARN Client: Failed to cleanup staging dir hdfs://cxln1.c.thelab-240901.internal:8020/user/yijiang4035/.sparkStaging/application_1568355188440_1399
org.apache.hadoop.security.AccessControlException: Permission denied: user=yijiang4035, access=WRITE, inode="/user/yijiang4035/.sparkStaging/application_1568355188440_1399":hdfs:yijiang4035:drwxr-xr-x
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:319)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:292)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:216)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:190)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1955)
at org.apache.hadoop.hdfs.server.namenode.FSDirDeleteOp.delete(FSDirDeleteOp.java:92)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:3968)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:1078)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:634)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2345)

If you were referring to the Hue jobs browser. I went there and couldn’t see any jobs listed there. Not even the jobs I completed.