in Education by
I am new to Apache Spark, and I just learned that Spark supports three types of cluster: Standalone - meaning Spark will manage its own cluster YARN - using Hadoop's YARN resource manager Mesos - Apache's dedicated resource manager project Since I am new to Spark, I think I should try Standalone first. But I wonder which one is the recommended. Say, in the future I need to build a large cluster (hundreds of instances), which cluster type should I go to? Select the correct answer from above options

1 Answer

0 votes
by
 
Best answer
Basically, we have three cluster types for Spark Standalone Apache Mesos Hadoop YARN Spark Standalone cluster (Spark deploy cluster) is Spark’s own built-in cluster environment. Since Spark Standalone is available in the default distribution of Apache Spark it is the easiest way to run your Spark applications in a clustered environment in many cases. Standalone mode is the easiest to set up and run your Spark applications. Also, it provides almost similar features similar to other cluster managers. Standalone works on 2 nodes: Standalone Master - It is a resource manager for the Spark Standalone cluster. Standalone Worker(standalone slave) - It is a worker in the Spark Standalone cluster, which actually assigns the tasks to every executor. YARN has quite good support regarding data locality for HDFS. Most Hadoop distributions already install YARN and HDFS together. On YARN, a Spark executor maps to a single YARN container. In order to deploy applications to YARN clusters, you need to use Spark with YARN support. Advantage of Yarn over Mesos and Standalone: YARN gives you an allowance to dynamically share and centrally configure the same pool of cluster resources amongst all frameworks that run on YARN. YARN has an authentication security service-level authorization, it is authentication for Web consoles and data confidentiality. Mesos handles the workload in a distributed environment by dynamic resource sharing and isolation. Mesos cluster manager is the recommended choice when it comes to managing large scale apache clusters. It is open-source software that sits between the application layer and the operating system and makes it easier to deploy and manage applications in large-scale clustered environments more efficiently. The main idea behind Mesos is to make a large collection of heterogeneous resources. Mesos introduces a mechanism called resource offers, i.e. distributed two-level scheduling. Mesos takes responsibility and decides how many resources are required by each framework, while frameworks have the power to accept their desired resources and computations, which will be running on them. One advantage we get using Mesos above YARN and Standalone is that Mesos has a unique thin resource sharing layer which gives frameworks a common interface for accessing cluster resources and hence, enables fine-grained sharing options across diverse cluster computing frameworks. The sole purpose is to increase resource utilization by deploying multiple distributed systems to a shared pool of nodes. If you want more information regarding the same, refer to the following video:

Related questions

0 votes
    I am using exit(), quit(), os._exit(), sys.exit() to stop script execution, and I am confused which ... use to stop script execution? Select the correct answer from above options...
asked Jan 23, 2022 in Education by JackTerrance
0 votes
    I have the following data frame empid...
asked Jan 25, 2022 in Education by JackTerrance
0 votes
    using python 3.8 and the spyder IDE the code is given here which I us to detect the faces from the recorded ... object cap.release() Select the correct answer from above options...
asked Jan 9, 2022 in Education by JackTerrance
0 votes
    How to sort a data.table using vector of multiple columns in R? Select the correct answer from above options...
asked Feb 1, 2022 in Education by JackTerrance
0 votes
    How can I reverse a str object in Python? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    I need to move all my work from my branch to another branch, but I'm not sure what is the best solution ... changes on the new branch? Select the correct answer from above options...
asked Jan 19, 2022 in Education by JackTerrance
0 votes
    What should I do after 12th to become a software engineer? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    Why is tossing a coin considered to be a fair way of deciding which team should choose ends in a game of cricket? Select the correct answer from above options...
asked Nov 25, 2021 in Education by JackTerrance
0 votes
    I am developing a website, which will recommend recipes to the visitors based on their data. I am collecting ... appreciated, Thanks. Select the correct answer from above options...
asked Feb 3, 2022 in Education by JackTerrance
0 votes
    I am trying to create a cluster using Heat Templates of Openstack. I have following template defining my ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 10, 2022 in Education by JackTerrance
0 votes
    I am trying to start a local Kubernetes cluster using minikube start and getting the following error. Starting local ... this slow? Select the correct answer from above options...
asked Jan 30, 2022 in Education by JackTerrance
0 votes
    Each __________ acts as a leader for some of its partitions and a follower for others so load is well balanced within the cluster. (1)Producer (2)Topic (3)Consumer (4)Kafka Server...
asked Jun 17, 2021 in Technology by JackTerrance
0 votes
    Which is the entry point used in Spark 2.0? 1. SparkSession 2. SparkContext 3. SqlContext 4. HiveContext...
asked Oct 22, 2020 in Technology by JackTerrance
0 votes
    Spark SQL allows users to load and query data from different data sources. Which property of Spark SQL is ... Data Access 2. Scalability 3. Hive Compatibility 4. Integrated...
asked Oct 22, 2020 in Technology by JackTerrance
0 votes
    Which is the entry point used in Spark 2.0? 1. SqlContext 2. HiveContext 3. SparkSession...
asked Oct 22, 2020 in Technology by JackTerrance
...