A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.
Runs as standalone application on a single box. Also supports embedded mode.
Runs as an mapreduce application on multiple Hadoop versions. Also supports Azkaban for launcing mapreduce jobs.
Cluster / Yarn
Runs as a standalone cluster with primary and worker nodes. This mode supports high availability, and can run on bare metals as well.
Runs as elastic cluster on public cloud. This mode supports high availability.