In a Hadoop cluster, find how to
contribute limited/specific amount
of storage as a slave to the cluster?

Dhruv Upadhyay
2 min readSep 6, 2022

--

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.

It creates a distributed Hadoop cluster which is a large-sized cluster to work upon. In this, we have a Master Node and any number of Data Node. All the Data nodes have some storage and this storage is used for creating a large size volume called a cluster. And since the storage is distributed in different locations it is called a Hadoop distributed cluster.

Steps to be followed to create the limited storage for the slave cluster:

  1. Create limited-sized extra storage that you want to add to the Hadoop cluster.
  2. Then attach this limited storage to the slave node.
  3. Create a partition for that storage and format it. It is similar to what we do with an external pen drive before we use it.
  4. After this, we need to mount the Folder/Directory that is used as cluster storage to the newly created limited storage.
  5. Then to confirm we can upload any file to the cluster as a client and can see where the file is located.

For this, I used the same setup using AWS as in: https://ds87702.medium.com/hadoop-uses-the-concept-of-parallelism-48a0cda8fee9

So, I attached the 1 GB additional volume to a Slave using EBS service.

Step2. Perform partitioning, formatting, and mounting of storage on Slave where additional storage is mounted

Step3. We will share data across the cluster and check both scenarios

By default, the size of the Data Node cluster was the size of the root storage of the instance. If the instance has 10GB of root storage the cluster was also of 10GB.

But when we mounted a /DataNode to the newly created, partitioned, and formatted 1GB volume and then started the Data Node again the distributed Hadoop slave cluster is of a limited size that is 1GB.

--

--