In this blog about HDFS Architecture
, you can read all about
. First of all, we will discuss Introduction to HDFS next with the Assumptions and Goals of HDFS design. This block will also cover the detailed architecture of Hadoop HDFS i.e NameNode, DataNode in HDFS, Secondary node, checkpoint node, Backup Node in HDFS.
like Rack awareness, high Availability, Data Blocks, Replication Management, HDFS data read and write operations are also discussed in this HDFS tutorial.
2. Hadoop HDFS Introduction
Hadoop distributed file system-HDFS
is the world’s most reliable storage system. HDFS
stores very large files running on a cluster of commodity hardware. It works on the principle of storage of less number of large files rather than the huge number of small files. HDFS stores data reliably even in the case of hardware failure. It provides high throughput by providing the data access in parallel.
2.1. HDFS Assumption and Goals
I. Hardware failure
is no more exception; it has become a regular term. HDFS instance consists of hundreds or thousands of server machines. Each of which is storing part of the file system’s data. There exist a huge number of components, which are very susceptible to hardware failure. This means that there are some components which are always non-functional. So core architectural goal of HDFS is quick and automatic fault detection/recovery
II. Streaming data access
HDFS applications need streaming access to their datasets. Hadoop HDFS is mainly designed for batch processing rather than interactive use by users. The force is on high throughput of data access rather than low latency of data access. It focuses on how to retrieve data at the fastest possible speed while analyzing logs.
III. Large datasets
HDFS always needs to work with large data sets. Standard practice is to have a file in HDFS of size ranging from gigabytes to terabytes. The design of HDFS architecture is such a way that it is best to store and retrieve a huge amount of data. HDFS should provide high aggregate data bandwidth. It should also be able to scale up to hundreds of nodes on a single cluster. It should be good enough to deal with tens of millions of files on a single instance.
IV. Simple coherency model
It works on theory write-once-read-many access model for files. Once the file is created, written and closed should not be changed. This resolves the data coherency issues and enables high throughput data access. A
-based application or web crawler application perfectly fits in this model. As per apache notes, there is a plan to support appending writes to files in the future.
V. Moving computation is cheaper than moving data
If an application does the computation near the data it operates on, it is much more efficient than done far of. This fact becomes stronger while dealing with large data set. The main advantage is that this increases the overall throughput of the system. It also minimizes network congestion. The assumption is that it is better to move computation closer to data instead of moving data to computation.
VI. Portability across heterogeneous hardware and software platforms
HDFS is designed with portable property to be portable from one platform to other. This enables widespread adoption of HDFS. It is the best platform while dealing with a large set of data.
3. HDFS Architecture
Hadoop HDFS has a Master/Slave
architecture in which Master
. HDFS Architecture consists of single NameNode and all the other nodes are DataNodes.
Let’s discuss each of the nodes in the Hadoop HDFS Architecture in detail-
3.1. HDFS NameNode
It is also known as Master node
. HDFS Namenode stores meta-data
i.e. number of
, replicas and other details. This meta-data is available in memory in the master for faster retrieval of data. NameNode maintains and manages the slave nodes, and assigns tasks to them. It should deploy on reliable hardware as it is the centerpiece of HDFS.
Task of NameNode
- Manage file system namespace.
- Regulates client’s access to files.
- It also executes file system execution such as naming, closing, opening files/directories.
All DataNodes sends a Heartbeat and block report to the NameNode in the
It ensures that the DataNodes are alive. A block report contains a list of all blocks on a datanode.
NameNode is also responsible for taking care of the Replication Factor
of all the blocks.
Files present in the NameNode metadata are as follows-
It is an “Image file”. FsImage
contains the entire filesystem namespace and stored as a file in the namenode’s local file system. It also contains a serialized form of all the directories and file inodes in the filesystem. Each inode
is an internal representation of file or directory’s metadata.
It contains all the recent modifications made to the file system on the most recent FsImage. Namenode receives a create/update/delete request from the client. After that this request is first recorded to edits file.
3.2. HDFS DataNode
It is also known as Slave
Hadoop HDFS Architecture
, DataNode stores actual data in HDFS. It performs
read and write operation
as per the request of the client. DataNodes can deploy on commodity hardware.
Task of DataNode
- Block replica creation, deletion, and replication according to the instruction of Namenode.
- DataNode manages data storage of the system.
- DataNodes send heartbeat to the NameNode to report the health of HDFS. By default, this frequency is set to 3 seconds.
3.3. Secondary NameNode
In HDFS, when NameNode starts, first it reads HDFS state from an image file, FsImage. After that, it applies edits from the edits log file. NameNode then writes new HDFS state to the FsImage. Then it starts normal operation with an empty edits file. At the time of start-up, NameNode merges FsImage and edits files, so the edit log file could get very large over time. A side effect of a larger edits file is that next restart of Namenode takes longer.
Secondary Namenodesolves this issue. Secondary NameNode downloads the FsImage and EditLogs from the NameNode. And then merges EditLogs with the FsImage (FileSystem Image). It keeps edits log size within a limit. It stores the modified FsImage into persistent storage. And we can use it in the case of NameNode failure.
Secondary NameNode performs a regular checkpoint in HDFS.
3.4. Checkpoint Node
The Checkpoint node
is a node which periodically creates checkpoints of the namespace. Checkpoint Node in Hadoop first downloads FsImage and edits from the Active Namenode. Then it merges them (FsImage and edits) locally, and at last, it uploads the new image back to the active NameNode. It stores the latest checkpoint in a directory that has the same structure as the Namenode’s directory. This permits the checkpointed image to be always available for reading by the namenode if necessary.
3.5. Backup Node
A Backup node
provides the same checkpointing functionality as the Checkpoint node. In Hadoop, Backup node keeps an in-memory, up-to-date copy of the file system namespace. It is always synchronized with the active NameNode state. The backup node in HDFS Architecture does not need to download FsImage and edits files from the active NameNode to create a checkpoint. It already has an up-to-date state of the namespace state in memory. The Backup node checkpoint process is more efficient as it only needs to save the namespace into the local FsImage file and reset edits. NameNode supports one Backup node at a time.
HDFSin Apache Hadoop
split huge files into small chunks known as Blocks.
These are the smallest unit of data in a filesystem. We (client and admin) do not have any control on the block like block location. NameNode decides all such things.
The default size of the HDFS block is 128 MB, which we can configure as per the need. All blocks of the file are of the same size except the last block, which can be the same size or smaller.
If the data size is less than the block size, then block size will be equal to the data size. For example, if the file size is 129 MB, then it will create 2 blocks. One block will be of default size 128 MB. The other will be 1 MB only and not 128 MB as it will waste the space. Hadoop is intelligent enough not to waste rest of 127 MB. So it is allocating 1Mb block only for 1MB data. The major advantages of storing data in such block size are that it saves disk seek time.
3.7. Replication Management
Block replication provides
. If one copy is not accessible and corrupted then we can read data from other copy. The number of copies or replicas of each block of a file is replication factor
. The default replication factor is 3 which are again configurable. So, each block replicates three times and stored on different DataNodes.
If we are storing a file of 128 MB in HDFS using the default configuration, we will end up occupying a space of 384 MB (3*128 MB).
NameNode receives block report from DataNode periodically to maintain the replication factor. When a block is over-replicated/under-replicated the NameNode add or delete replicas as needed.
3.8. Rack Awareness
In a large
cluster of Hadoop
, in order to improve the network traffic while reading/writing HDFS file, NameNode chooses the DataNode which is closer to the same rack or nearby rack to Read /write request. NameNode achieves rack information by maintaining the rack ids of each DataNode.
in Hadoop is the concept that chooses Datanodes based on the rack information.
In HDFS Architecture
, NameNode makes sure that all the replicas are not stored on the same rack or single rack. It follows Rack Awareness Algorithm
to reduce latency as well as fault tolerance. We know that default replication factor is 3. According to Rack Awareness Algorithm
first replica of a block will store on a local rack. The next replica will store another datanode within the same rack. The third replica will store on different rack In Hadoop.
Rack Awareness is important to improve:
- The performance of the cluster.
- To improve network bandwidth.
3.9. HDFS Read/Write Operation
3.9.1. Write Operation
When a client wants towrite a file to HDFS, it communicates to namenode for metadata. The Namenode responds with a number of blocks, their location, replicas and other details. Based on information from Namenode, client split files into multiple blocks. After that, it starts sending them to first Datanode.
The client first sends block A to Datanode 1 with other two Datanodes details. When Datanode 1 receives block A from the client, Datanode 1 copy same block to Datanode 2 of the same rack. As both the Datanodes are in the same rack so block transfer via rack switch. Now Datanode 2 copies the same block to Datanode 3. As both the Datanodes are in different racks so block transfer via an out-of-rack switch.
When Datanode receives the blocks from the client, it sends write confirmation to Namenode.
The same process is repeated for each block o the file.
3.9.2. Read Operation
Toread from HDFS, the first client communicates to namenode for metadata. A client comes out of name node with the name of files and its location. The Namenode responds with number of blocks, their location, replicas and other details.
Now client communicates with Datanodes. The client starts reading data parallel from the Datanodes based on the information received from the namenode. When client or application receives all the block of the file, it combines these blocks into the form of an original file.
In conclusion to HDFS Architecture, we can say that it is a architecture which gives a complete picture of HDFS. With the help of NameNode and DataNode it reliably store very large files across machines in a large cluster. It stores each files as a sequence of blocks. The block replication provides fault tolerance. It provides high availability, as data is highly available despite of hardware failure. When a machine or hardware crashes, then we can access data from another path.
If you like this post or have any query about HDFS Architecture in Hadoop, So, Do leave a comment.