Introduction
The CouchbaseWriter supports writing documents to a Couchbase bucket though the Couchbase Java SDK. Note that CouchbaseWriter only supports writing to a single bucket as there should be only 1 CouchbaseEnvironment per JVM.
Record format
Couchbase writer currently support AVRO
and JSON
as data inputs. On both of them it requires the following structured schema:
Document field | Description |
---|---|
key |
Unique key used to store the document on the bucket. For more info view Couchbase docs |
data.data |
Object or value containing the information associated with the key for this document |
data.flags |
Couchbase flags To store JSON on data.data use 0x02 << 24 for UTF-8 0x04 << 24 . |
The following is a sample input record with JSON data
{
"key": "myKey123",
"data": {
"data": {
"field1": "field1Value",
"field2": 123
},
"flags": 33554432
}
}
or to store plain text:
{
"key": "myKey123",
"data": {
"data": "singleValueData",
"flags": 67108864
}
}
If using AVRO, use the following schema:
{
"type" : "record",
"name" : "topLevelRecord",
"fields" : [ {
"name" : "key",
"type" : "string"
}, {
"name" : "data",
"type" : {
"type" : "record",
"name" : "data",
"namespace" : "topLevelRecord",
"fields" : [ {
"name" : "data",
"type" : [ "bytes", "null" ]
}, {
"name" : "flags",
"type" : "int"
} ]
}
} ]
}
Note that the key can be other than string if needed.
Configuration
General configuration values
Configuration Key | Default Value | Description |
---|---|---|
writer.couchbase.bucket |
Optional | Name of the couchbase bucket. Change if using other than default bucket |
writer.couchbase.default |
"default" |
Name of the default bucket if writer.couchbase.bucket is not provided |
writer.couchbase.dnsSrvEnabled |
"false" |
Enable DNS SRV bootstrapping docs |
writer.couchbase.bootstrapServers | localhost| URL to bootstrap servers. If using DNS SRV set writer.couchbase.dnsSrvEnabled` to true |
||
writer.couchbase.sslEnabled |
false |
Use SSL to connect to couchbase |
writer.couchbase.password |
Optional | Bucket password. Will be ignored if writer.couchbase.certAuthEnabled is true |
writer.couchbase.certAuthEnabled |
false |
Set to true if using certificate authentication. Must also specify writer.couchbase.sslKeystoreFile , writer.couchbase.sslKeystorePassword , writer.couchbase.sslTruststoreFile , and writer.couchbase.sslTruststorePassword |
writer.couchbase.sslKeystoreFile |
Optional | Path to the keystore file location |
writer.couchbase.sslKeystorePassword |
Optional | Keystore password |
writer.couchbase.sslTruststoreFile |
Optional | Path to the trustStore file location |
writer.couchbase.sslTruststorePassword |
Optional | TrustStore password |
writer.couchbase.documentTTL |
0 |
Time To Live of each document. Units are specified in writer.couchbase.documentTTLOriginField |
writer.couchbase.documentTTLUnits |
SECONDS |
Unit for writer.couchbase.documentTTL . Must be one of java.util.concurrent.TimeUnit. Case insensitive |
writer.couchbase.documentTTLOriginField |
Optional | Time To Live of each document. Units are specified in writer.couchbase.documentTTLOriginField |
writer.couchbase.documentTTLOriginUnits |
MILLISECONDS |
Unit for writer.couchbase.documentTTL . Must be one of java.util.concurrent.TimeUnit. Case insensitive. As an example a writer.couchbase.documentTTLOriginField value of 1568240399000 and writer.couchbase.documentTTLOriginUnits value of MILLISECONDS timeunit would be Wed Sep 11 15:19:59 PDT 2019 |
writer.couchbase.retriesEnabled |
false |
Enable write retries on failures |
writer.couchbase.maxRetries |
5 |
Maximum number of retries |
writer.couchbase.failureAllowancePercentage |
0.0 |
The percentage of failures that you are willing to tolerate while writing to Couchbase. Gobblin will mark the workunit successful and move on if there are failures but not enough to trip the failure threshold. Only successfully acknowledged writes are counted as successful, all others are considered as failures. The default for the failureAllowancePercentage is set to 0.0. For example, if the value is set to 0.2 This means that as long as 80% of the data is acknowledged by Couchbase, Gobblin will move on. If you want higher guarantees, set this config value to a lower value. e.g. If you want 99% delivery guarantees, set this value to 0.01 |
operationTimeoutMillis |
10000 |
Global timeout for couchbase communication operations |
Authentication
No credentials
NOT RECOMMENDED FOR PRODUCTION.
Do not set writer.couchbase.certAuthEnabled
nor writer.couchbase.password
Using certificates
Set writer.couchbase.certAuthEnabled
to true
and values for writer.couchbase.sslKeystoreFile
, writer.couchbase.sslKeystorePassword
, writer.couchbase.sslTruststoreFile
, and writer.couchbase.sslTruststorePassword
.
writer.couchbase.password
setting will be ignored if writer.couchbase.certAuthEnabled
is set
Using bucket password
Set writer.couchbase.password
Document level expiration
Couchbase writer allows to set expiration at the document level using the expiry property of the couchbase document. PLease note that current couchbase implementation using timestamps limits it to January 19, 2038 03:14:07 GM given the type of expiry is set to int. CouchbaseWriter only works with global timestamps and does not use relative expiration in seconds (<30 days) for simplicity. Currently three modes are supported:
1 - Expiration from write time
Define only writer.couchbase.documentTTL
and writer.couchbase.documentTTLUnits
. For example for a 2 days expiration configs would look like:
Configuration Key | Value |
---|---|
writer.couchbase.documentTTL |
2 |
writer.couchbase.documentTTLUnits |
DAYS |
2 - Expiration from an origin timestamp
Define only writer.couchbase.documentTTL
and writer.couchbase.documentTTLUnits
.
For example for a 2 days expiration configs using the header.time
field that has timestamp in MILLISECONDS would look like:
Configuration Key | Value |
---|---|
writer.couchbase.documentTTL |
2 |
writer.couchbase.documentTTLUnits |
"DAYS" |
writer.couchbase.documentTTLOriginField |
"header.time" |
writer.couchbase.documentTTLOriginUnits |
1568240399000 |
So a sample document with origin on 1568240399 (Wed Sep 11 15:19:59 PDT 2019) would expire on 1568413199 (Fri Sep 13 15:19:59 PDT 2019). The following is a sample record format.
{
"key": "sampleKey",
"data": {
"data": {
"field1": "field1Value",
"header": {
"time": 1568240399000
}
},
"flags": 33554432
}
}
}