Resilient Distributed Datasets is the basic data structure of Apache Spark. It is installed in the Spark Core. They are immutable and fault-tolerant. RDDs are generated by transforming already present RDDs or storing an outer dataset from well-built storage like HDFS or HBase.
Since they have distributed collections of objects, they can be operated in parallel. Resilient Distributed Datasets are divided into parts such that they can be executed on various nodes of a cluster.