MemSQL automatically manages cluster workloads by limiting execution of queries that require fully distributed execution, to ensure that they are matched with available system resources. Workload management estimates the number of connections and threads needed to execute queries that require reshuffle and broadcast operations, and admits the query only if it can assign the necessary resources.
Workload management also estimates the amount of memory required to execute queries and only runs queries if sufficient memory is expected to be available. Queries that are not immediately executed are put in a queue and are executed when system resources become available. Workload management improves overall query execution efficiency and prevents workload surges from overwhelming the system. It allows queries to run successfully when the system is low on connections, threads, or memory, rather than failing.
Related to workload management is the concept of resource governance. Resource governance allows you to restrict or limit resource usage for users/queries, such as limiting their query execution memory or CPU usage. For more information, see Setting Resource Limits.
Configuration and System Variables
Workload management can be configured using system variables, and the default settings are sufficient for most users regardless of cluster size. However, these system variables can be tuned to fully utilize system resources for high concurrency workloads.
Before changing any of these values, ensure that you understand each of these variables and have evaluated their impact on your workload.
All of the variables associated with workload management can be changed by setting them on the MemSQL aggregator nodes (the variables have no effect when set on leaf nodes). See How to Update System Variables.
workload_management: Specifies whether to enable workload management for the cluster. The default value is
ON. If set to
ON, the feature is enabled and the other workload management system variables will affect the way a query is executed. If set to
OFF, the feature is disabled and no queueing or system resource optimization will occur.
workload_management_expected_aggregators: The expected number of aggregators that will be used to run a high volume of client queries which require fully distributed execution. The default value is
0, which is equivalent to setting it to the total number of aggregators in the cluster. It may be useful to set this to a non-default value if you only run your workload on a subset of aggregator nodes (for example, if you have 1 master aggregator and 2 child aggregators, and only run your query workload against the child aggregators, you can set this variable to
workload_management_max_connections_per_leaf: The maximum number of connections to use per leaf node in the cluster. The default value is
workload_management_max_queue_depth: The maximum depth of the query queue, which is the maximum number of queries that can be queued. The default value is
100. If this number is reached, additional queries will not execute, and a
ER_TOO_MANY_QUEUED_QUERIES error will appear.
workload_management_max_threads_per_leaf: The maximum number of threads to use per leaf. The default value is
8192. This number correlates with the
max_connection_threads global variable, and they should generally be set to the same value.
workload_management_memory_queue_threshold: The percentage of memory a query can use before it will be queued.
If an individual query is projected to use more than
workload_management_memory_queue_threshold * (leaf_maximum_memory - leaf_current_table_memory) / workload_management_expected_aggregators, then it will be queued.
workload_management_memory_queuing: Specifies whether workload management for memory is enabled. Valid values are
OFF. The default is
workload_management_queue_time_warning_ratio: Specifies when a warning will appear based on the ratio of time spent by a query in the queue versus the actual execution time of the query. For example, if a query waits in the queue for one second and it takes four seconds to execute, the ratio is 1:4, or 0.25. The default value is
0.500000. Once the specified ratio is reached for a query, a
ER_QUERY_QUEUED_WARNING warning will appear.
workload_management_queue_timeout: The time duration in seconds after which a query times out and is removed from the queue without being executed. The default value is
Individual aggregators normally decide whether to queue a query based on local information; however, if an individual query uses more than
50% * workload_management_max_connections_per_leaf / workload_management_expected_aggregators, the queueing decision will be made by consulting the master aggregator.
In addition, if more than
50% * workload_management_max_connections_per_leaf / workload_management_expected_aggregators are currently used for queries running on an aggregator, then queries arriving there will be queued.
0, then the total number of aggregators is used in the above formulas.
At the beginning of query execution, the system estimates how much memory it will take based on what it has learned from previous runs of the query.
If a query is estimated to take a small amount of memory, it will be run immediately without queuing.
If a query is estimated to take a moderate amount of memory, and there is insufficient memory, it will be queued locally, with the decision whether to queue made on the local aggregator.
If a query is estimated to take a large amount of memory, a decision will be made in consultation with the master aggregator about whether to queue the query.
Errors and Warnings
Queries return configurable errors if too many queries are queued or a query has been queued for too long. Queries also return configurable warnings when they were queued for a significant time relative to their actual execution time. This allows users to identify when their cluster resources are insufficient to meet the load of incoming queries.
These errors and warnings are:
ER_TOO_MANY_QUEUED_QUERIES: When the
workload_management_max_queue_depth value is reached, MemSQL will return this error instead of adding a new query to the queue. If you encounter this error, the maximum queue depth may need to be increased to accommodate the load. This error may also indicate that your cluster’s resources need to be increased or the load of incoming queries needs to be reduced.
ER_QUERY_QUEUE_TIMEOUT: When a query has been queued longer than the
workload_management_queue_timeout value, it will return this error and will be removed from the queue. This error indicates that your cluster’s resources or the load of incoming queries need to be adjusted to successfully process incoming queries.
ER_QUERY_QUEUED_WARNING: When the ratio of time spent by a query in the queue versus the actual execution time of the query exceeds the
workload_management_queue_time_warning_ratio value, the query will return this warning. The warning indicates the amount of time spent queued and the amount of time spent executing, to help you understand the sources of query latency. Encountering this warning is normal in many workloads, but if query latency is too high, your cluster’s resources may not be sufficient to process incoming queries in a timely manner.
Observing the Status of Workload Management
To see the current state of the workload management system, including the number of running and queued queries, and resource usage and thresholds, use the
SHOW WORKLOAD MANAGEMENT STATUS command. The following example shows sample output from a small, lightly-loaded system:
SHOW WORKLOAD MANAGEMENT STATUS; +-----------------------------------------+-------+ | Stat | Value | +-----------------------------------------+-------+ | QueuedQueries | 0 | | RunningQueries (queued locally) | 0 | | RunningQueries (queued globally) | 0 | | runningMemoryOnLeaves | 0 | | runningThreadsPerLeaf | 0 | | runningConnectionsPerLeaf | 0 | | Memory Threshold to Queue Locally | 77 | | Memory Threshold to Queue Globally | 7741 | | Connections Threshold to Queue Globally | 5000 | | Threads Threshold to Queue Globally | 4096 | +-----------------------------------------+-------+
The units for memory thresholds shown above are in megabytes.