Hadoop
Hadoop is associate degree open supply project of the Apache foundation, it's a framework written in Java, originally developed by Doug Cutting in 2005, it was created to support distribution for Nutch, the text program. Hadoop uses Google's Map scale back and Google classification system Technologies as its foundation. Some of the major features of Hadoop are given below:
1. Hadoop Is Easily Scalable, what that means is new nodes can easily be added to the existing data, which makes it ideal to be used in open source projects.
2. Hadoop Is Fault Tolerant, it gets this reputation as the data is stored up in HDFS where the data is automatically gets replicated to other places.
3. It is great at faster data processing, which is attributable to its ability to try and do multiprocessing, hadoop will perform batch processes ten times quicker than on one thread server or on the mainframe.
Hadoop generates value edges by bringing parallel computing to the servers, leading to a considerable reduction within the value per TB of storage, that successively makes it cheap to model all of the information.
Hadoop scheme
Following are elements of Hadoop :
HDFS: Hadoop Distributed file system. It merely stores information files as close to original form of the that file.
HBase: It's Hadoop's database, it supports structured information storage for big tables.
Hive: It allows analysis of huge datasets employing a language like commonplace ANSI SQL, which means that anyone familiar with SQL ought to be ready to access information on a Hadoop cluster.
Pig: It's a simple to grasp information flow language. It helps with analysis of huge datasets that is kind of the order with Hadoop.