Monitoring

Overview

As a framework for ingesting potentially huge volume of data from many different sources, it's critical to monitor the health and status of the system and job executions. Gobblin employs a variety of approaches introduced below for this purpose. All the approaches are optional and can be configured to be turned on and off in different combinations through the framework and job configurations.

Metrics Collecting and Reporting

Metrics Reporting

Out-of-the-box, Gobblin reports metrics though:

JMX : used in the standalone deployment. Metrics reported to JMX can be checked using using tools such as VisualVM or JConsole.
Metric log files: Files are stored in a root directory defined by the property metrics.log.dir. Each Gobblin job has its own subdirectory under the root directory and each run of the job has its own metric log file named after the job ID as ${job_id}.metrics.log.
Hadoop counters : used for M/R deployments. Gobblin-specific metrics are reported in the "JOB" or "TASK" groups for job- and task- level metrics. By default, task-level metrics are not reported through Hadoop counters as doing so may cause the number of Hadoop counters to go beyond the system-wide limit. However, users can choose to turn on reporting task-level metrics as Hadoop counters by setting mr.include.task.counters=true.

Metrics collection

JVM Metrics

The standalone deployment of Gobblin runs in a single JVM so it's important to monitor the health of the JVM, through a set of pre-defined JVM metrics in the following four categories.

jvm.gc: this covers metrics related to garbage collection, e.g., counts and time spent on garbage collection.
jvm.memory: this covers metrics related to memory usage, e.g., detailed heap usage.
jvm.threads: this covers metrics related to thread states, e.g., thread count and thread deadlocks.
jvm.fileDescriptorRatio: this measures the ratio of open file descriptors.

All JVM metrics are reported via JMX and can be checked using tools such as VisualVM or JConsole.

Pre-defined Job Execution Metrics

Internally, Gobblin pre-defines a minimum set of metrics listed below in two metric groups: JOB and TASK for job-level metrics and task-level metrics, respectively. Those metrics are useful in keeping track of the progress and performance of job executions.

${metric_group}.${id}.records: this metric keeps track of the total number of data records extracted by the job or task depending on the ${metric_group}. The ${id} is either a job ID or a task ID depending on the ${metric_group}.
${metric_group}.${id}.recordsPerSecond: this metric keeps track of the rate of data extraction as data records extracted per second by the job or task depending on the ${metric_group}.
${metric_group}.${id}.bytes: this metric keeps track of the total number of bytes extracted by the job or task depending on the ${metric_group}.
${metric_group}.${id}.bytesPerSecond: this metric keeps track of the rate of data extraction as bytes extracted per second by the job or task depending on the ${metric_group}.

Among the above metrics, ${metric_group}.${id}.records and ${metric_group}.${id}.bytes are reported as Hadoop MapReduce counters for Gobblin jobs running on Hadoop.

Job Execution History Store

Gobblin also supports writing job execution information to a job execution history store backed by a database of choice. Gobblin uses MySQL by default and it ships with the SQL DDLs of the relevant MySQL tables, although it still allows users to choose which database to use as long as the schema of the tables is compatible. Users can use the properties job.history.store.url and job.history.store.jdbc.driver to specify the database URL and the JDBC driver to work with the database of choice. The user name and password used to access the database can be specified using the properties job.history.store.user and job.history.store.password. An example configuration is shown below:

job.history.store.url=jdbc:mysql://localhost/gobblin
job.history.store.jdbc.driver=com.mysql.jdbc.Driver
job.history.store.user=gobblin
job.history.store.password=gobblin

Email Notifications

In addition to writing job execution information to the job execution history store, Gobblin also supports sending email notifications about job status. Job status notifications fall into two categories: alerts in case of job failures and normal notifications in case of successful job completions. Users can choose to enable or disable both categories using the properties email.alert.enabled and email.notification.enabled.

The main content of an email alert or notification is a job status report in Json format. Below is an example job status report:

{
    "job name": "Gobblin_Demo_Job",
    "job id": "job_Gobblin_Demo_Job_1417487480842",
    "job state": "COMMITTED",
    "start time": 1417487480874,
    "end time": 1417490858913,
    "duration": 3378039,
    "tasks": 1,
    "completed tasks": 1,
    "task states": [
        {
            "task id": "task_Gobblin_Demo_Job_1417487480842_0",
            "task state": "COMMITTED",
            "start time": 1417490795903,
            "end time": 1417490858908,
            "duration": 63005,
            "high watermark": -1,
            "exception": ""
        }
    ]
}

Table of Contents