Table of Contents
Introduction
Writing to a http based sink is done by sending a http or restful request and handling the response. Given the endpoint uri, query parameters, and body, it is straightforward to construct a http request. The idea is to build a writer that writes a http record, which contains those elements of a request. The writer builds a http or rest request from multiple http records, sends the request with a client that knows the server, and handles the response.
Note
The old http write framework under AbstractHttpWriter
and AbstractHttpWriterBuilder
is deprecated (Deprecation date: 05/15/2018)! Use AsyncHttpWriter and AsyncHttpWriterBuilder instead
Constructs
Figure 1. Http write flow
HttpOperation
A http record is represented as a HttpOperation object. It has 4 fields.
| Field Name | Description | Example |
|---|---|---|
keys |
Optional, a key/value map to interpolate the url template | {"memberId": "123"} |
queryParams |
Optional, a map from query parameter to its value | {"action": "update"} |
headers |
Optional, a map from header key to ts value | {"version": "2.0"} |
body |
Optional, the request body in string or json string format | "{\"email\": \"httpwrite@test.com\"}" |
Given an url template, http://www.test.com/profiles/${memberId}, from job configuration, the resolved
example request url with keys and queryParams information will be http://www.test.com/profiles/123?action=update.
AsyncRequestBuilder
An AsyncRequestBuilder builds an AsyncRequest from a collection of HttpOperation records. It could build one
request per record or batch multiple records into a single request. A builder is also responsible for
putting the headers and setting the body to the request.
HttpClient
A HttpClient sends a request and returns a response. If necessary, it should setup the connection to the server, for
example, sending an authorization request to get access token. How authorization is done is per use case. Gobblin does
not provide general support for authorization yet.
ResponseHandler
A ResponseHandler handles a response of a request. It returns a ResponseStatus object to the framework, which
would resend the request if it's a SERVER_ERROR.
Build an asynchronous writer
AsyncHttpWriterBuilder is the base builder to build an asynchronous http writer. A specific writer can be created by
providing the 3 major components: a HttpClient, a AsyncRequestBuilder, and a ResponseHandler.
Gobblin offers 2 implementations of async
http writers. As long as your write requirement can be expressed as a HttpOperation through a Converter, the
2 implementations should work with configurations.
AvroHttpWriterBuilder
An AvroHttpWriterBuilder builds an AsyncHttpWriter on top of the apache httpcomponents framework, sending vanilla http request.
The 3 major components are:
ApacheHttpClient. It usesCloseableHttpClientto sendHttpUriRequestand receiveCloseableHttpResponseApacheHttpRequestBuilder. It builds aApacheHttpRequest, which is anAsyncRequestthat wraps theHttpUriRequest, from oneHttpOperationApacheHttpResponseHandler. It handles aHttpResponse
Configurations for the builder are:
| Configuration | Description | Example |
|---|---|---|
gobblin.writer.http.urlTemplate |
Required, the url template(schema and port included), together with keys and queryParams, to be resolved to request url |
http://www.test.com/profiles/${memberId} |
gobblin.writer.http.verb |
Required, http verbs | get, update, delete, etc |
gobblin.writer.http.errorCodeWhitelist |
Optional, http error codes allowed to pass through | 404, 500, etc. No error code is allowed by default |
gobblin.writer.http.maxAttempts |
Optional, max number of attempts including initial send | Default is 3 |
gobblin.writer.http.contentType |
Optional, content type of the request body | "application/json", which is the default value |
R2RestWriterBuilder
A R2RestWriterBuilder builds an AsyncHttpWriter on top of restli r2 framework, sending
rest request. The 3 major components are:
R2Client. It uses a R2Clientto sendRestRequestand receiveRestResponseR2RestRequestBuilder. It builds aR2Request, which is anAsyncRequestthat wraps theRestRequest, from oneHttpOperationR2RestResponseHandler. It handles aRestResponse
R2RestWriterBuilder has d2 and ssl support. Configurations((d2.) part should be added in d2 mode) for the builder are:
| Configuration | Description | Example |
|---|---|---|
gobblin.writer.http.urlTemplate |
Required, the url template(schema and port included), together with keys and queryParams, to be resolved to request url. If the schema is d2, d2 is enabled |
http://www.test.com/profiles/${memberId} |
gobblin.writer.http.verb |
Required, rest(rest.li) verbs | get, update, put, delete, etc |
gobblin.writer.http.maxAttempts |
Optional, max number of attempts including initial send | Default is 3 |
gobblin.writer.http.errorCodeWhitelist |
Optional, http error codes allowed to pass through | 404, 500, etc. No error code is allowed by default |
gobblin.writer.http.d2.zkHosts |
Required for d2, the zookeeper address | |
gobblin.writer.http.(d2.)ssl |
Optional, enable ssl | Default is false |
gobblin.writer.http.(d2.)keyStoreFilePath |
Required for ssl | /tmp/identity.p12 |
gobblin.writer.http.(d2.)keyStoreType |
Required for ssl | PKCS12 |
gobblin.writer.http.(d2.)keyStorePassword |
Required for ssl | |
gobblin.writer.http.(d2.)trustStoreFilePath |
Required for ssl | |
gobblin.writer.http.(d2.)trustStorePassword |
Required for ssl | |
gobblin.writer.http.protocolVersion |
Optional, protocol version of rest.li | 2.0.0, which is the default value |
R2RestWriterBuilder isn't ingegrated with PasswordManager to process encrypted passwords yet. The task is tracked as https://issues.apache.org/jira/browse/GOBBLIN-487
Build a synchronous writer
The idea is to reuse an asynchronous writer to build its synchronous version. The technical difference between them
is the size of outstanding writes. Set gobblin.writer.http.maxOutstandingWrites to be 1(default value is 1000) to make a synchronous writer