Workload Management

Starting in version 5.5, MemSQL automatically manages cluster workloads by limiting execution of queries that require fully distributed execution, to ensure that they are matched with available system resources. Workload management estimates the number of connections and threads needed to execute queries that require reshuffle and broadcast operations, and admits the query only if it can assign the necessary resources. Queries that are not immediately executed are put in a queue and are executed when system resources become available. Workload management improves overall query execution efficiency and prevents workload surges from overwhelming the system.

Configuration and System Variables

Workload management can be configured using system variables, and the default settings are sufficient for most users regardless of cluster size. However, these system variables can be tuned to fully utilize system resources for high concurrency workloads.

Before changing any of these values, ensure that you understand each of these variables and have evaluated their impact on your workload.

All of the variables associated with workload management can be changed by setting them on the MemSQL aggregator nodes (the variables have no effect when set on leaf nodes). See How to Update System Variables.

workload_management: Specifies whether to enable workload management for the cluster. The default value is ON. If set to ON, the feature is enabled and the other workload management system variables will affect the way a query is executed. If set to OFF, the feature is disabled and no queueing or system resource optimization will occur.

workload_management_expected_aggregators: The expected number of aggregators that will be used to run a high volume of client queries which require fully distributed execution. The default value is 0, which is equivalent to setting it to the total number of aggregators in the cluster. It may be useful to set this to a non-default value if you only run your workload on a subset of aggregator nodes (for example, if you have 1 master aggregator and 2 child aggregators, and only run your query workload against the child aggregators, you can set this variable to 2).

workload_management_max_connections_per_leaf: The maximum number of connections to use per leaf node in the cluster. The default value is 10000.

workload_management_max_queue_depth: The maximum depth of the query queue, which is the maximum number of queries that can be queued. The default value is 100. If this number is reached, additional queries will not execute, and a ER_TOO_MANY_QUEUED_QUERIES error will appear.

workload_management_max_threads_per_leaf: The maximum number of threads to use per leaf. The default value is 8192. This number correlates with the max_connection_threads global variable, and they should generally be set to the same value.

workload_management_queue_time_warning_ratio: Specifies when a warning will appear based on the ratio of time spent by a query in the queue versus the actual execution time of the query. For example, if a query waits in the queue for one second and takes two seconds to execute, the ratio is 0.5. The default value is 0.500000. Once the specified ratio is reached for a query, a ER_QUERY_QUEUED_WARNING warning will appear.

workload_management_queue_timeout: The time duration in seconds after which a query times out and is removed from the queue without being executed. The default value is 3600.

Errors and Warnings

Queries return configurable errors if too many queries are queued or a query has been queued for too long. Queries also return configurable warnings when they were queued for a significant time relative to their actual execution time. This allows users to identify when their cluster resources are insufficient to meet the load of incoming queries.

These errors and warnings are:

ER_TOO_MANY_QUEUED_QUERIES: When the workload_management_max_queue_depth value is reached, you will be unable to execute additional queries against the cluster. If you encounter this error in a scenario that requires temporary periods of heavy load, the queue depth may need to be increased to accommodate the load. However, if you are consistently encountering this error for normal workloads, this error indicates that your cluster’s resources need to be increased or the load of incoming queries needs to be reduced.

ER_QUERY_QUEUE_TIMEOUT: When a queued query reaches the workload_management_queue_timeout value, it will fail to execute and will be removed from the queue. This error indicates that your cluster’s resources or the load of incoming queries need to be adjusted to successfully process incoming queries.

ER_QUERY_QUEUED_WARNING: When the ratio of time spent by a query in the queue versus the actual execution time of the query exceeds the workload_management_queue_time_warning_ratio value, this warning will appear. This warning indicates that your cluster’s resources may not be sufficient to process incoming queries in a timely manner.

Was this article useful?