in Education by
Definition says: RDD is immutable distributed collection of objects I don't quite understand what does it mean. Is it like data (partitioned objects) stored on hard disk If so then how come RDD's can have user-defined classes (Such as java, scala or python) From this link: https://www.safaribooksonline.com/library/view/learning-spark/9781449359034/ch03.html It mentions: Users create RDDs in two ways: by loading an external dataset, or by distributing a collection of objects (e.g., a list or set) in their driver program I am really confused understanding RDD in general and in relation to spark and hadoop. Can some one please help. Select the correct answer from above options

1 Answer

0 votes
by
 
Best answer
RDD i.e. Resilient Distributed Datasets is an immutable distributed collection of objects. It is fundamental data structure of Spark. There are logical partitions for each dataset in RDD, which can be computed on different nodes of the cluster. RDDs can contain objects of any type Python, Java, or Scala, also it can include user-defined classes. An RDD is a read-only, partitioned collection of records. RDD is a collection of elements which is fault-tolerant and can be operated on in parallel. There are two ways to create RDDs − 1. Parallelizing an existing collection in your driver program, or 2. Referencing a dataset in an external storage system, such as a shared file system, HDFS, HBase, or any data source offering a Hadoop Input Format. Spark makes use of the concept of RDD. RDD makes Spark achieve faster and efficient MapReduce operations.

Related questions

0 votes
    How can I convert an RDD to a dataframe? I converted a data frame to rdd using .rdd. After processing it I ... convert it back to rdd Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    I want to know in simple language what are all the differences between rdd and dataframes? Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    Is spark dependent on Hadoop? If not, then I can run Spark without Hadoop right? Will I miss any features if I do Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    so I am constructing a recommedation model using ALS package And make all user-product list by cartesian ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jul 8, 2022 in Education by JackTerrance
0 votes
    I'm trying to run a spark application using bin/spark-submit. When I reference my application jar inside my local ... version 1.2.1 Select the correct answer from above options...
asked Jan 29, 2022 in Education by JackTerrance
0 votes
    I am working on a project with spark and scala and I am new to both but with lot of help from ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 19, 2022 in Education by JackTerrance
0 votes
    I have a 2 column (1 int and 1 double) dataframe "fit_comparison", of predicted values and linear ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 9, 2022 in Education by JackTerrance
0 votes
    I am trying to understand how spark runs on YARN cluster/client. I have the following question in my mind. Is ... the job to cluster? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    Hi guys simple question for experienced guys. I have a spark job reading files under a path. I wanted ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 6, 2022 in Education by JackTerrance
0 votes
    I am trying to write code for converting data in Java RDD to a histogram so that I can bin the data in a certain ... (EntryWithSize entry) { long size = entry.getSize(); if (size...
asked Apr 9, 2022 in Education by JackTerrance
0 votes
    May be i am asking very basic question apology for that, but i didn't find it's answer on ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jul 20, 2022 in Education by JackTerrance
0 votes
    May be i am asking very basic question apology for that, but i didn't find it's answer on ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jul 14, 2022 in Education by JackTerrance
0 votes
    How can I convert an RDD to a dataframe? I converted a data frame to rdd using .rdd. After processing ... ,Core Questions, Core Hadoop MCQ,core interview questions for experienced...
asked Oct 31, 2021 in Education by JackTerrance
0 votes
    I work with Spark often, and it would save me a lot of time if the compiler could ensure that a type is serializable. ... T to be serializable } It's not enough to constrain T...
asked Jul 3, 2022 in Education by JackTerrance
0 votes
    I have a generic method which a generic type parameter T which is a subclass of MyClass. Inside that method, I want ... of type erasure): object Demo extends App { def myMethod[T...
asked Jun 30, 2022 in Education by JackTerrance
...