What is Hadoop?
Hadoop is an open-source framework that allows us to store and process big data-sets in a distributed environment across clusters of computers using simple programming models.
Hadoop is mainly divided into two types: (i) HDFS (ii) MapReduce
(i)HDFS: Mainly used for storage of large data-sets
(ii)MapReduce: It is a programming model for processing large data-sets
Why do we need Hadoop?
- Data is growing faster
- Need to process multi petabyte and terabyte of data
- The performance of traditional application is decreasing
- The number of machines in cluster is not constant
Before you start proceeding with this tutorial, we assume that you have prior exposure to Core Java, database concepts, and any of the Linux operating system.