In MemSQL Ops, when you are viewing the metrics for a given MemSQL Pipeline, the “Committed” and “Database Writes” metrics represent the number of Writes that MemSQL does to it’s internal storage system. This can result in unexpected behavior when you have at least one Unique Key in the destination table. For example, if you have “Error Handling Behavior” set to “Replace”, and a row from the stream is a duplicate of a row in the table, then MemSQL will report two writes occurred. This is because MemSQL needs to remove the offending row, and replace it with the new row from the stream.
If Ops reports that one of your pipelines is experiencing negative lag, it means that the pipeline’s source’s latest Kafka offset in a partition is lower than the latest loaded offset for that partition. This type of lag usually means that the Kafka configuration has been changed since the pipeline was created. Setting the pipeline’s offsets to latest could resolve the issue, however, it will incur data loss.
Pipeline Load Data Options
Under the hood of Pipelines, MemSQL uses parts of the LOAD DATA engine to ingest data after extraction and transformation. Because of this, part of the configuration of a Pipeline is its LOAD DATA statement which controls how to parse the data passing through the stream.
Pipeline Data Format
In the MemSQL Ops create pipeline wizard, the user must pick the underlying format of their data after it is emitted by Kafka (or an optional transform). To assist the user, two common formats (CSV and TSV) are provided. If your format deviates from those formats, you can select “Custom” in the dropdown to customize the format as needed.