HDFS Routine Analysis include two main things of Hadoop:
- Read File
- Write File
In the above figure, Client node submit the read request of file from HDFS to the namenode, as namenode contains the metadata .i. location of data file present in datanode. So namenode send the block location to the client node.
Now as client get the location of data file, it will read the data from the datanode.
In the above figure, here client submit the job to JobTracker(J.T.), JobTracker take the request and first create the job_id but before creating the job_id , it will make 4 checks:
- Asks the J.T. to create job_id.
- Checks the output specification of the job.
- Checks the input specification of the job.
- Checks for the jar file creation of the job and second for the configuration file, it is .xml file and these both files are kept in shared filesystem.
After making all the checks , J.T. will create the job_id. As job_id gets created , it will split the job into number of task and assign the task to tasktracker(T.T.). It will try to assign the task to those T.T. which contains input files or to those T.T. which has most nearby input files, after assigning task it monitors the running task .
After task will finish T.T. will write the final output in the same node in which processing is taking place. But as we know by default 3 copies of data file is present in hdfs. So first will be created in the same node and the other 2 copies is decided by namenode.