- Pig: A high-level data-flow language and execution framework for parallel computation.
- Hive: A data warehouse infrastructure that provides data summarization and adhoc querying.
- HBase: A scalable, distributed database that supports structured data storage for large tables.
- Sqoop : A tool for efficiently moving data between relational databases and HDFS.
- Map Reduce: A distributed data processing model and execution environment that runs on large clusters of commodity machines.
- HDFS : A distributed filesystem that runs on large clusters of commodity machines.
- Zookeeper: It is a distributed, open-source coordination service for distributed applications. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming.
- Oozie: It is a java web application which is used to schedule apache hadoop jobs. It is integrated with hadoop stack and supports scheduling of hadoop jobs for MapReduce, pig, hive, sqoop as well as system specific jobs such as java programs and shell scripts.
- Flume: The main purpose of flume is to collect the large volume of weblogs in real time. It streams data from multiple sources into hadoop for analysis.It act as a connection agent.