HDFS is based on Master-Slave Architecture.
Master: Single NameNode for managing HDFS metadata.
Slaves: Multiple DataNodes for storing data.
Secondary NameNode: It is the housekeeping of HDFS and for checkpointing.
Given below is the architecture of a Hadoop File System.
Below is the detailed Architecture of HDFS:
HDFS has the following elements or we can say daemons:
NameNode: It performs the below roles:
- The NameNode manages the file system namespace.
- The NameNode stores the metadata of the HDFS.
- The state of HDFS is stored in a file called fsimage and is the base of the metadata. During the runtime modifications are just written to a log file called edits. On the next start-up of the NameNode the state is read from fsimage, the changes from edits are applied to that and the new state is written back to fsimage. After this edits is cleared and contains is now ready for new log entries.
- This keeps information about each block that are stored in datanode in its memory.
- The secondary name node job is not to be a secondary to the name node. So it is not a substitute to the Namenode.
- Secondary namenode fetches periodically fsimage and edits from the NameNode and merges them. This allows the namenode to start up faster next time.
DataNode: It performs the below roles:
- It is mainly used to store the data files of hdfs i.e. actual data. It is also known as slave node
- Datanode only stores block, a block is what is used to store and process the data.Data resides within blocks of Datanode.
- Datanodes perform read-write operations on the file systems, as per client request.
- Datanode gives periodic heartbeat signals to Masternode i.e. NameNode to indicate that it is alive.