1 Answer

0 votes
by

Hadoop MapReduce is a software framework for processing enormous data sets. It is the main component for data processing in the Hadoop framework. It divides the input data into several parts and runs a program on every data component parallel at one. The word MapReduce refers to two separate and different tasks.

The first is the map operation, which takes a set of data and transforms it into a different collection of data, where individual elements are divided into tuples. The reduce operation consolidates those data tuples based on the key and subsequently modifies the value of the key.

Let us take an example of a text file called example_data.txt and understand how MapReduce works.

The content of the example_data.txt file is:
coding,jamming,ice,river,man,driving

Now, assume we have to find out the word count on the example_data.txt using MapReduce. So, we will be looking for the unique words and the number of times those unique words appeared.

  • First, we break the input into three divisions, as seen in the figure. This will share the work among all the map nodes.
  • Then, all the words are tokenized in each of the mappers, and a hardcoded value (1) to each of the tokens is given. The reason behind giving a hardcoded value equal to 1 is that every word by itself will, at least, occur once.
  • Now, a list of key-value pairs will be created where the key is nothing but the individual words and value is one. So, for the first line (Coding Ice Jamming), we have three key-value pairs – Coding, 1; Ice, 1; Jamming, 1.
  • The mapping process persists the same on all the nodes.
  • Next, a partition process occurs where sorting and shuffling follow so that all the tuples with the same key are sent to the identical reducer.
  • Subsequent to the sorting and shuffling phase, every reducer will have a unique key and a list of values matching that very key. For example, Coding, [1,1]; Ice, [1,1,1].., etc.
  • Now, each Reducer adds the values which are present in that list of values. As shown in the example, the reducer gets a list of values [1,1] for the key Jamming. Then, it adds the number of ones in the same list and gives the final output as – Jamming, 2.
  • Lastly, all the output key/value pairs are then assembled and written in the output file.

Related questions

0 votes
    Suppose you have designed a Big Data batch using the MapReduce framework. Now you want to execute it on a cluster ... Run view? Name Node Data Node Resource Manager Job Tracker...
asked Mar 23, 2021 in Technology by JackTerrance
0 votes
    To import data to Hadoop cluster from relational database sqoop create a mapreduce job. In this job A - All ... the schema of relational table is validated without fetching data...
asked Jan 13, 2021 in Technology by JackTerrance
0 votes
    I would like to know the relation between the mapreduce.map.memory.mb and mapred.map.child.java.opts parameters. ... .child.java.opts? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    Why are there two separate packages map-reduce package in Apache's hadoop package tree: org.apache.hadoop.mapred ... separated out? Select the correct answer from above options...
asked Jan 26, 2022 in Education by JackTerrance
0 votes
    I am having some trouble writing the code for wordcount program using MapReduce, I could use some help. Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    Hi, I am using MapReduce and there are so many steps in it. I want the value of the last reduced step as ... . Thank you in advance. Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    Hi, I am using MapReduce and there are so many steps in it. I want the value of the last reduced step ... ),Core Questions, Core Hadoop MCQ,core interview questions for experienced...
asked Oct 31, 2021 in Education by JackTerrance
0 votes
    What is Amazon Elastic MapReduce (EMR)?...
asked Feb 10, 2023 in Technology by JackTerrance
0 votes
    What are the main Configuration parameters that has to be specified by the user to run MapReduce?...
asked Aug 6, 2021 in Technology by JackTerrance
0 votes
    What is the distributed Cache in MapReduce framework?...
asked Aug 6, 2021 in Technology by JackTerrance
0 votes
    MapReduce jobs submitted from either Oozie, Pig or Hive can be used to encode, improve and sample the data sets ... of R Programming Select the correct answer from above options...
asked Feb 12, 2022 in Education by JackTerrance
0 votes
    What is shuffling in MapReduce?...
asked Aug 2, 2021 by JackTerrance
0 votes
    Which of the following are true for Hadoop Pseudo Distributed Mode? 1. It runs on multiple machines 2. Runs on ... with all daemons 4. Runs on Single Machine without all daemons...
asked Aug 7, 2021 in Technology by JackTerrance
0 votes
    What is Apache Flume in Hadoop ?...
asked Aug 7, 2021 in Technology by JackTerrance
0 votes
    What is the actions followed by a Jobtracker in Hadoop?...
asked Aug 5, 2021 in Technology by JackTerrance
...