Hadoop provides a framework that allows for distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computational and storage. The library is so designed that it can detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
CAPS™ provides the Hadoop ODL™ that is composed of the following:-
Hadoop Common: The common utilities that support the other Hadoop modules.
Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
Hadoop YARN: A framework for job scheduling and cluster resource management.
Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
Besides the above, CAPS™ also contains a combination of hadoop related projects at Apache includes:-
Cassandra – a scalable multi-master database with no single points of failure
Hive – data warehouse infrastructure that provides data summarization and ad-hoc querying
Pig – high level data flow language and execution framework for parallel computation
Spark – fast and general compute engine for Hadoop data. Spart provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing and graph computation.
ZooKeeper – high performance coordination service for distributed applications.
Get CAPS™ access to try out your own Hadoop ODL™.
Oracle Data Discovery – The visual face of Hadoop