in Education by
I am trying to understand how spark runs on YARN cluster/client. I have the following question in my mind. Is it necessary that spark is installed on all the nodes in yarn cluster? I think it should because worker nodes in cluster execute a task and should be able to decode the code(spark APIs) in spark application sent to cluster by the driver? It says in the documentation "Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster". Why does client node have to install Hadoop when it is sending the job to cluster? Select the correct answer from above options

1 Answer

0 votes
by
 
Best answer
Coming to your second question first, Ensure that HADOOP_CONF_DIR or YARN_CONF_DIR points to the directory which contains the (client side) configuration files for the Hadoop cluster. It is mandatory to do as these configs are used to write to HDFS and connect to the YARN ResourceManager. The configuration contained in this directory will be distributed to the YARN cluster so that the same configuration would be used by all containers used by the application. If the configuration references Java system properties or environment variables not managed by YARN, they should also be set in the Spark application’s configuration (driver, executors, and the AM when running in client mode). Now, to launch Spark applications on YARN, we have got two deploy modes. Cluster mode Client mode In cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN. Unlike other cluster managers supported by Spark in which the master’s address is specified in the --master parameter, in YARN mode the ResourceManager’s address is picked up from the Hadoop configuration. Thus, the --master parameter is yarn. To launch a Spark application in cluster mode: $ ./bin/spark-submit --class path.to.your.Class --master yarn --deploy-mode cluster [options] [app options] For example: $ ./bin/spark-submit --class org.apache.spark.examples.SparkPi \ --master yarn \ --deploy-mode cluster \ --driver-memory 4g \ --executor-memory 2g \ --executor-cores 1 \ --queue thequeue \ examples/jars/spark-examples*.jar \ The above starts a YARN client program which starts the default Application Master. Then SparkPi will be run as a child thread of Application Master. The client will periodically poll the Application Master for status updates and display them in the console. The client will exit once your application has finished running. Refer to the “Debugging your Application” section below for how to see driver and executor logs. To launch a Spark application in client mode, do the same, but replace cluster with a client. The following shows how you can run spark-shell in client mode: $ ./bin/spark-shell --master yarn --deploy-mode client You can refer the following video if you want more information regarding the same:

Related questions

0 votes
    I would like to know the relation between the mapreduce.map.memory.mb and mapred.map.child.java.opts parameters. ... .child.java.opts? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    Definition says: RDD is immutable distributed collection of objects I don't quite understand what does it mean. Is ... one please help. Select the correct answer from above options...
asked Jan 30, 2022 in Education by JackTerrance
0 votes
    Is spark dependent on Hadoop? If not, then I can run Spark without Hadoop right? Will I miss any features if I do Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    The Hadoop list includes the HBase database, the Apache Mahout ________ system, and matrix operations. ... recognition 3. Statistical classification 4. Artificial intelligence...
asked Dec 7, 2022 in Education by JackTerrance
0 votes
    To access the services of the operating system, the interface is provided by the ___________ 1. Library 2. System calls 3. Assembly instructions 4. API...
asked Dec 7, 2022 in Education by JackTerrance
0 votes
    In OS which of the following is/are CPU scheduling algorithms? 1. Priority 2. Round Robin 3. Shortest Job First 4. All of the mentioned...
asked Dec 7, 2022 in Education by JackTerrance
0 votes
    What is the main function of the command interpreter? 1. to provide the interface between the API and application ... the next user-specified command 4. none of the mentioned...
asked Dec 7, 2022 in Education by JackTerrance
0 votes
    In Operating System CPU scheduling is the basis of ___________ 1. multiprogramming operating systems 2. larger memory sized ... 3. multiprocessor systems 4. none of the mentioned...
asked Dec 7, 2022 in Education by JackTerrance
0 votes
    Sun also has the Hadoop Live CD ________ project, which allows running a fully functional Hadoop cluster using a live CD. 1. OpenOffice.org 2. OpenSolaris 3. GNU 4. Linux...
asked Dec 6, 2022 in Education by JackTerrance
0 votes
    I have data regarding gender of people under 8 columns: mem1;mem2;mem3;mem4;mem5;mem6;mem7;mem8 MALE; ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 14, 2022 in Education by JackTerrance
0 votes
    I have this set of data: dump data; This is a sample output: (this dataset is almost a million ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 7, 2022 in Education by JackTerrance
0 votes
    When I am running a Hadoop .jar file from the command prompt, it throws an exception saying no such method ... (Child.java:264) Select the correct answer from above options...
asked Feb 2, 2022 in Education by JackTerrance
0 votes
    So, I've been looking at Hadoop with keen interest, and to be honest I'm fascinated, things don't get much ... anyone know of one? Select the correct answer from above options...
asked Feb 2, 2022 in Education by JackTerrance
0 votes
    I am new to Hadoop/ZooKeeper. I cannot understand the purpose of using ZooKeeper with Hadoop, is ZooKeeper writing ... with Hadoop? Select the correct answer from above options...
asked Jan 30, 2022 in Education by JackTerrance
0 votes
    I have an image with horizontal and vertical lines. In fact, this image is the BBC website converted to ... rectangle. Thanks! Select the correct answer from above options...
asked Jan 29, 2022 in Education by JackTerrance
...