Output format specifications
Overview
Data Bridge enables configuring data views that can be set up for event streaming or publishing a snapshot of CRM data. Event streaming is supported by message-based updates published to an Apache Kafka topic. Data snapshots can be published to either a Kafka topic or saved as files. The following data serialization formats are supported for each publishing target type:
Publishing target | Supported formats |
---|---|
Apache Kafka topic | JSON, Avro |
File | JSON, Delimited |
Event streaming
The section describes the key elements of the message structure generated by Data Bridge when event streaming is configured. Each event is described by a hierarchical object structure that is formatted in JSON or Apache Avro when published to a Kafka topic. There are several noteworthy metadata attributes that are generated automatically for each event:
- "topic" - name of the Kafka topic where the message will be published.
- "timestamp" - the date and time when the event occurred. Specifically, this timestamp represents the exact time when the associated data was created or modified at the source in NexJ CRM. It can be used as the "version" of the data described by the update. The consumer should examine the value of this timestamp and compare to its local copy in order to detect possible out-of-sequence updates.
- "key" - the unique identifier for each event.
- "value" - the "payload" object that contains the set of attributes that were included in the Data Bridge view that generated the event, as well as several system-generated fields. Refer to the addition details and an example below.
The following automatically generated fields will be present under the "value" object in each event or message published by Data Bridge:
- "_class" - the name of CRM model metaclass associated with the event (for example, "Person" for CRM contacts).
- "_event" - the type of CRM event. Possible values are "create", "update" or "delete". "Null" will be present when a view snapshot is published.
- "_oid" - the unique identifier of the object being affected. In conjunction with "_class", it can uniquely identify a CRM data object.
For example:
Data serialization formats
The following data serialization formats are supported:
Delimited
It is possible to publish a view snapshot into a delimited file structure, such as CSV. The following elements of a delimited format are supported and can be configured when you are creating a File (Delimited) publishing target.
- File Extension - Enter the required file extension for exported delimited files. You can also configure this setting using the CSV File Extension setting in the Data Bridge System Admin Console.
- Delimiter - By default, the comma character is used as a delimiter. You can enter a different character in this field. For a tab delimiter, leave the field blank. This value cannot exceed 1 character. You can also configure this setting using the CSV Delimiter setting in the Data Bridge System Admin Console.
- Show Header - By default, Yes is selected, which means that the header row will be included in the export. To not show the header in the exported file, click No. You can also configure this setting using the CSV Show Header setting in the Data Bridge System Admin Console.
- Is Quoted - By default, Yes is selected, which means that strings in the exported delimited files are surrounded with quotation marks. To not use quotation marks for strings, click No. You can also configure this setting using the CSV Is Quoted setting in the Data Bridge System Admin Console.
- Quote Character - By default, the double quotation mark (") is used to surround strings in the exported delimited files. You can enter a different character in this field. This value cannot exceed 1 character. You can also configure this setting using the CSV Quote Character setting in the Data Bridge System Admin Console.
JSON
The JSON files that are generated from snapshots that use the File (JSON) publishing target have a .json extension and the file encoding uses UTF-8. JSON messages are formatted according to the specifications at https://www.json.org/json-en.html.
Apache Avro
Data Bridge 3.5.0 supports the 1.8.1 version of the Apache Avro specification.
You can export Avro schema for a view. For more information, see Viewing definitions.
Type-specific formatting
Data Bridge formats the following data types in a consistent manner for both JSON and Avro-serialized messages:
- Booleans
- Unicode strings (handing of UTF-8 characters)
- Dates
- Decimal number values
- Timestamps
Timestamp attributes
By default, the timestamp
attributes are formatted as long
integers (Unix Epoch), in Data Bridge, but they can be formatted as strings. To change to using a string representation of timestamp
attributes for both Avro and JSON output messages sent to Kafka or a file, navigate to the Global Settings workspace, navigate to the JSON/Avro Format Configuration card, and select String in the Timestamp Attribute Formatting field. For more information, see Using the Global Settings workspace.
The following code snippet shows an example of a timestamp
attribute, expectedcloseDate
, formatted as a long
integer:
{
"_class": "ProgressedOpportunity",
"_event": null,
"_oid": "10E3FA0A377FF645A98257A942BCDE2A4A",
"expectedCloseDate": 1637539994153
}
The following code snippet shows the expectedcloseDate
timestamp formatted as a string:
{
"_class": "ProgressedOpportunity",
"_event": null,
"_oid": "1004B7D51F3F6649DDB8D4C07482755930",
"expectedCloseDate": "2021-07-29T21:34:35.780Z"
}
Timestamp localization
Timestamps that are CRM data attributes (for example, the Activity Start Time) are localized to the Coordinated Universal Time (UTC) time zone as they are exported or displayed in the Data Bridge UI.
Metadata timestamps (for example, the Snapshot Date) are displayed in the user's local time zone in the following Data Bridge UI locations:
- History tab
- About tab
- Errors tab
- Monitoring Dashboard
The above information also applies to any timestamps that become part of generated file names for JSON, delimited, or manifest files.
String attributes
String attributes are formatted as a sequence of Unicode characters using UTF8 encoding for both JSON and Avro messages.
As of Data Bridge 3.4.0, regular unicode characters are no longer escaped using the \uXXXX hexidecimal format (where where XXXX are four hexidecimal digits).
Special characters (for example, quotation marks, line feeds, and so on) are escaped using "\".
File specifications
This section describes manifest files, and the file naming convention for manifest, JSON, and delimited files.
Manifest files
Whenever a snapshot is published (whether manually or by a scheduler) for a Data Bridge view, a special manifest file is generated on the file system to signify the completion of the snapshot. A manifest file is a JSON file that contains the following details:
- The Event ID for the specific snapshot operation, which is the Event ID that displays in the Snapshot subtab in the History tab for snapshots in the Data Bridge UI.
- The extract start time and end time of the snapshot export process
- When multiple output extract files are included in the snapshot, the subdirectory name containing the extract files will be specified as folderName.
- The file name of each output extract file created by the snapshot.
- The number of NexJ CRM objects included in each output extract file.
The manifest files are always created in the <DataBridge_Extracts>/manifest
directory and have .manifest
file extension. When the manifest file is first created, it will be appended with a .temp
file extension (for example, Companies_2021-08-03-17-05-26-179_EEB8169B6CAF4FD7A0A6525AE23BB92D.manifest
.temp
). Once all the information has successfully been written to the manifest file, the .temp extension will be removed.
File naming convention
The file naming convention for the manifest, JSON, and delimited files is:
<View Name>_<Timestamp>_
<Event ID>_<Index>.{manifest/json/csv}
where:
<View Name>
is the Data Bridge view that triggered the snapshot export.<Timestamp>
uses the following format: "YYYY-MM-dd-HH-mm-ss-SSS" (for example, 2021-08-03-14-17-28-072). The Timestamp value will be localized to the server’s time zone.<Event ID>
is a unique identifier for the snapshot. It is also visible in the Event ID column in the History tab in the Data Bridge UI.- When the
meta.bridge.isChunked
output file parameter in your environment file istrue
,<Index>
applies to File (Delimited) and File (JSON) snapshots. It is the ordinal index of the generated file in a sequence when multiple files are produced by the snapshot.
The following table contains example file names:
File type | Example file name |
---|---|
Manifest file | Companies_2021-08-03-17-05-26-179_EEB8169B6CAF4FD7A0A6525AE23BB92D.manifest |
Delimited file (single file) | Companies_2021-08-03-17-05-26-179_EEB8169B6CAF4FD7A0A6525AE23BB92D.csv |
JSON file (one of many files when meta.bridge.isChunked=true ) | Companies_2021-08-03-17-05-26-179_EEB8169B6CAF4FD7A0A6525AE23BB92D_99.json |