Architecture View Of Map-Reduce
Main Components of Map-Reduce:
JobClient: which submits the map-reduce to job tracker.
JobTracker: It is the part of master node.
- There is Single Job Tracker per cluster
- Schedule Map and Reduce Tasks for TaskTrackers. It keep the track of assigning task to those task tracker which are free.
- Monitors the Tasks and keeps the track of TaskTrackers status whether it is working or not , actually T.T. sends the regular heartbeat to the J.T. on count of every 3 sec whether it is alive or not
- Re-execute tasks on failure
TaskTracker: It is part of slave node.
- Per node there is single TaskTrackers and there are multiple Task Trackers in a cluster
- It runs the Map and Reduce Tasks.
Below diagram shows the flow of Map-Reduce:
Hadoop does its best to run the map task on a node where the input data resides in HDFS. This is called the data localityoptimization.
- Map task can have data locality .
- Reduce task can not have data locality advantage. Because reducer takes its input from multiple mapper i.e. multiple node as values with same type of key goes to the same reducer.
Data Locality has 3 possibilities:-
- Data Local
- Rack Local
- Off-rack map tasks