ISSUE: Inconsistent Query Run Times
If you receive very different response times when you run the same query (or groups of queries on the same leaf) repeatedly, Transparent Huge Pages (THP) may be the cause.
Linux organizes RAM into pages that are ordinarily 4KB. With THP, Linux can instead use 2MB pages or larger. But THP transparently re-organizes memory used by a process inside the kernel, merging some small pages to huge pages, and splitting some huge pages to small pages. This background process holds locks on the memory manager, and can prevent your process from accessing memory.
You can check for THP by running
cat /sys/kernel/mm/*transparent_hugepage/enabled. If the [brackets] are around [always], you should disable THP on all nodes in your cluster. Fully remedying the issue may require restarting MemSQL, or fully restarting all of the servers. Cluster operators should disable THP before launching in production.
Note: Having THP enabled may also be the cause of high system CPU (sometimes called red CPU).
How to Disable THP
MemSQL recommends disabling transparent huge pages (THP) at boot time on all nodes (master aggregator, child aggregators, and leaves) in the cluster. Not doing so may result in inconsistent query run times.
To disable THP, add the following lines to the end of
rc.local before the exit line (if present), and reboot the host machines.
For Debian-based distributions, add the following to
echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
For RedHat/CentOS distributions, add the following to
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo 0 > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
khugepaged/defrag option will be
0 on newer Linux versions (e.g. CentOS 7+), but
no on older versions (e.g. CentOS 6). You can check which your system uses by running
cat /sys/kernel/mm/*transparent_hugepage/khugepaged/defrag and keep only the matching setting. For example, if you see
0 update the line with
echo 0. If you see
no, update the line with
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Updates and Deletes in MemSQL lock the row they are updating, and only that row. If the update is a singleton update (or upsert), then this error means that two processes are trying to update the same row. This itself isn’t the end of the world, we have a setting
lock_wait_timeout which defaults to 60s. If a query is waiting on a row for more than 60s it fails, which should be plenty of time. So the first query updates, then the second. If you’re seeing this error there is serious contention. This happens most often when updating on a field that is not indexed, since MemSQL takes a very short lock on every row while scanning through the entire table to see if it matches the where clause on the update.
- If you’re experiencing contention with singleton transactions the solution is to implement indexes.
For example, if the following query generates a lock wait timeout exceeded error:
DELETE FROM bucket WHERE meta_uid_id = ?
To address the error, add an index on meta_uid_id.
If you’re experiencing contention with transactions having multiple statements, you can turn off
show variables like '%multistatement_transactions%';
Shows whether multistatement transactions are on, and
SET GLOBAL multistatement_transactions = off;
turns it off.
By turning off multi-statement transactions, write queries that touch multiple nodes will commit locally when the processing finishes on the node. For example, if node 1 hits a duplicate key error, node 2 will still commit its transaction.