What is Map-Reduce ?
Map Reduce is a programming model for data processing which can an be written in various languages , i.e. Java, Ruby, Python, and C++. Map Reduce programs are inherently parallel.
How Map-Reduce Works?
MapReduce works by breaking the processing into two phases -> map phase and reduce phase. Each phase has key-value pairs as input and output. The types of input and output is chosen by the programmer.
Below diagram explain the simple working of map-reduce phase:
Map Phase: Mapper will take input data as key-value pair from HDFS, here data is in text form, it is not in key-value form but mapper read the data in key-value pair. So to convert from text to key-value pair, by default text input format is used. Text input format takes byte offset as and key and value will be one line text or record.
Mapper process the data and emits intermediate output in the form of key-value pair. The intermediate output is not stored in HDFS but it is stored in local file system.
Reduce Phase: Reduce phase accepts the intermediate output of mapper from local file system. The intermediate output will get deleted after all the data has been transferred to the reducer. So after taking the intermediate output as input to reducer and reducer will emit the final output key-value to the HDFS.