Currently, Gobblin supports the following feature list:
Different Data Sources
Source Type | Protocol | Vendors |
---|---|---|
RDMS | JDBC | MySQL/SQLServer |
Files | HDFS/SFTP/LocalFS | N/A |
Salesforce | REST | Salesforce |
-
Different Pulling Types
- SNAPSHOT-ONLY: Pull the snapshot of one dataset.
- SNAPSHOT-APPEND: Pull delta changes since last run, optionally merge delta changes into snapshot (Delta changes include updates to the dataset since last run).
- APPEND-ONLY: Pull delta changes since last run, and append to dataset.
-
Different Deployment Types
- standalone deploy on a single machine
- cluster deploy on hadoop 2.3.0
-
Compaction
- Merge delta changes into snapshot.