Pipelines System Variables

MemSQL Pipelines uses a few system variables that are either specific to an extractor or generic for the feature itself. You can see these variables and their default setting by executing the SHOW VARIABLES LIKE 'pipeline%'; statement:

memsql> SHOW VARIABLES LIKE '%pipeline%';
| Variable_name                               | Value    |
| pipelines_batches_metadata_to_keep          | 1000     |
| pipelines_deskew_batch_partitions_threshold | 0.750000 |
| pipelines_extractor_debug_logging           | OFF      |
| pipelines_extractor_get_offsets_timeout_ms  | 20000    |
| pipelines_extractor_idle_timeout_ms         | 120000   |
| pipelines_kafka_version                     |  |
| pipelines_max_concurrent                    | 50       |
| pipelines_max_concurrent_batch_partitions   | 0        |
| pipelines_max_errors_per_partition          | 1000     |
| pipelines_max_offsets_per_batch_partition   | 1000000  |
| pipelines_max_retries_per_batch_partition   | 4        |
| pipelines_stderr_bufsize                    | 65535    |
| pipelines_stop_on_error                     | ON       |
| pipelines_stored_proc_exactly_once          | ON       |
14 rows in set (0.00 sec)

You cannot set a variable for a specific pipeline – each variable setting applies to all pipelines in the cluster.

The following table describes each of these variables, arranged by general configuration and extractor-specific configuration:

To learn how to set pipelines variables in your cluster, see the How to Set Pipelines System Variables section below.

General Pipelines System Variables

Variable Name Default Value Description
pipelines_batches_metadata_to_keep 1000 The number of batch metadata entries to persist before they are overwritten by incoming batches.
As data is extracted from a source, it’s written in batches to a destination table on a leaf node. Metadata about these batches is temporarily persisted in the master aggregator’s information_schema.PIPELINES_BATCHES table. As new batches are loaded into the database, the oldest batch metadata entries will be removed from the information_schema.PIPELINES_BATCHES table. See the information_schema.PIPELINES_BATCHES Table section for more information about this metadata.
pipelines_extractor_debug_logging OFF Specifies whether to enable extractor debugging for Kafka pipelines. This variable currently does not apply to S3 pipelines.
pipelines_extractor_get_offsets_timeout_ms 10000 The maximum time in milliseconds to wait for offset data to be returned from the data source before returning an error. Increase this value if you experience timeout errors, such as ERROR 1970 (HY000): Subprocess timed out.
pipelines_max_retries_per_batch_partition 4 The number of retry attempts for writing batch partition data to the destination table.
If pipelines_stop_on_error is set to OFF and the specified retry number is reached without success, the batch partition will be skipped and will not appear in the destination table. If a batch partition is skipped, data loss can occur.
If pipelines_stop_on_error is set to ON and the specified retry number is reached without success, the pipeline will stop. No batch partition data will be skipped.
This configuration variable applies to the entire batch transaction, which includes extraction from a data source, optional transformation, and loading of the data into the destination table. If the batch transaction fails at any point during extraction, transformation, or loading, it will be retried up to the specified number.
pipelines_stop_on_error ON Specifies whether or not each pipeline in the cluster should stop when an error occurs.
If set to OFF, batches will be retried up to the number specified in the pipelines_max_retries_per_batch_partition variable. After all retries have failed, the batch will be skipped. When a batch is skipped, data loss can occur.
If set to ON, the batch transaction that caused the error will be retried up to the number specified in the pipelines_max_retries_per_batch_partition variable. After all retries have failed, the pipeline will enter a Stopped state and must be manually started.
pipelines_max_errors_per_partition 1000 The maximum number of error event rows per leaf node partition to persist before they are deleted.
Once the specified number of rows in the information_schema.PIPELINES_ERRORS table is reached, the database will eventually remove the oldest rows from the table. The removal mechanism for older error data is based on heuristics. Old errors are guaranteed to exist up to the specified number, but they may not immediately be removed.
pipelines_stderr_bufsize 65535 The buffer size for standard error output in bytes. Error messages that exceed this size will be truncated when written to the information_schema.PIPELINES_ERRORS table. However, the complete standard error text can be viewed by using the BATCH_ID and querying the information_schema.PIPELINES_BATCHES table.
pipelines_max_offsets_per_batch_partition 1000000 The maximum number of data source partition offsets to extract in a single batch transaction. If the data source’s partition contains fewer than the specified number of offsets, all of the partition’s offsets will be batched into the destination table.

Kafka Extractor System Variables

Variable Name Default Value Description
pipelines_kafka_version The Kafka version used for the Kafka extractor. While the default version is, newer versions can also be specified.

How to Set Pipelines System Variables

The system variables associated with pipelines can be set and updated in the same manner as other MemSQL system variables. For more information on setting pipeline system variables, see How to Update System Variables.

Was this article useful?