What is it about?

Size of the data used in today's enterprises has been growing at exponential rates from last few years. Simultaneously, the need to process and analyze the large volumes of data has also increased. To handle and for analysis of large datasets, an open-source implementation of Apache framework, Hadoop is used now-a-days. For managing and storing of all the resources across its cluster, Hadoop possesses a distributed file system called Hadoop Distributed File System (HDFS). HDFS is written completely in Java and is depicted in such a way that in can store Big Data more reliably, and can stream those at high processing time to the user applications. In recent days, Hadoop is used widely by popular organizations like Yahoo, Facebook and various online shopping market venders. On the other hand, Experiments on Data Intensive computations are going on to parallelize the processing of data. None of them could actually achieve a desirable performance. Hadoop, with its Map-Reduce parallel data processing capability can achieve these goals efficiently [I]. This paper initially provides an overview of the HDFS in details. Later on, the paper reports the experimental work of Hadoop with the big data and suggests the various factors that affects the Hadoop cluster performance. Paper concludes with providing the different real field challenges of Hadoop in recent days and scope for future work.

Featured Image

Why is it important?

This paper presents extrapolate various issues in HDFS in managing Big Data.

Perspectives

It is a rigorous experimental evaluation of HDFS performance.

Ripon Patgiri
National Institute of Technology Silchar

Read the Original

This page is a summary of: Performance evaluation of HDFS in big data management, December 2014, Institute of Electrical & Electronics Engineers (IEEE),
DOI: 10.1109/ichpca.2014.7045330.
You can read the full text:

Read

Contributors

The following have contributed to this page