ISSUE: Inconsistent Query Run Times
If you receive very different response times when you run the same query (or groups of queries on the same leaf) repeatedly, Transparent Huge Pages (THP) may be the cause.
Linux organizes RAM into pages that are ordinarily 4KB. With THP, Linux can instead use 2MB pages or larger. But THP transparently re-organizes memory used by a process inside the kernel, merging some small pages to huge pages, and splitting some huge pages to small pages. This background process holds locks on the memory manager, and can prevent your process from accessing memory.
You can check for THP by running
cat /sys/kernel/mm/*transparent_hugepage/enabled. If the [brackets] are around [always], you should disable THP on all nodes in your cluster. Fully remedying the issue may require restarting MemSQL, or fully restarting all of the servers. Cluster operators should disable THP before launching in production.
Note: Having THP enabled may also be the cause of high system CPU (sometimes called red CPU).
How to Disable THP
MemSQL recommends disabling transparent huge pages (THP) at boot time on all nodes (master aggregator, child aggregators, and leaves) in the cluster. Not doing so may result in inconsistent query run times.
To disable THP, add the following lines to the end of
rc.local before the exit line (if present), and reboot the host machines.
For Debian-based distributions, add the following to
echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/transparent_hugepage/khugepaged/defrag
For RedHat/CentOS distributions, add the following to
echo never > /sys/kernel/mm/redhat_transparent_hugepage/enabled echo never > /sys/kernel/mm/redhat_transparent_hugepage/defrag echo 0 > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag echo no > /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag
khugepaged/defrag option will be
0 on newer Linux versions (e.g. CentOS 7+), but
no on older versions (e.g. CentOS 6). You can check which your system uses by running
cat /sys/kernel/mm/*transparent_hugepage/khugepaged/defrag and keep only the matching setting. For example, if you see
0 update the line with
echo 0. If you see
no, update the line with
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
Updates and Deletes in MemSQL lock the row being updated. If the update is a singleton update (or upsert), then this error means that two processes are trying to update the same row. The setting
lock_wait_timeout defaults to 60s, which implies that if a query is waiting on a row for more than 60s then it fails. So the first query updates, then the second. If you are seeing this error there is a serious contention. This happens most often when updating on a field that is not indexed, since MemSQL takes a very short lock on every row while scanning through the entire table to see if it matches the where clause on the update.
- If you’re experiencing contention with singleton transactions the solution is to implement indexes.
For example, if the following query generates a lock wait timeout exceeded error:
DELETE FROM bucket WHERE meta_uid_id = ?
To address the error, add an index on meta_uid_id.
- If you’re experiencing contention with transactions having multiple statements, you can turn off
show variables like '%multistatement_transactions%';
Shows whether multistatement transactions are on, and
SET GLOBAL multistatement_transactions = off;
turns it off.
memsql-admin update-config --key multistatement_transactions --value off --set-global --memsql-id <aggregator_id>
memsql-ops memsql-update-config --key multistatement_transactions --value off --set-global <aggregator_id>
By turning off multi-statement transactions, write queries that touch multiple nodes will commit locally when the processing finishes on the node. For example, if node 1 hits a duplicate key error, node 2 will still commit its transaction.
ERROR 1706 (HY000): Feature ‘Multi-table UPDATE/DELETE with a reference table as target table’ is not supported by MemSQL
Updating a reference table based on a join with a sharded table returns an error. This is because reference tables are only updatable on the master aggregator while joins against sharded tables must run on the leaves, so the query cannot run like normal updates.
Use the same table types that use both the target as well as joined table as sharded tables or as reference tables.
UPSERTwith a reference table that has a primary key. For example, if
t1is a reference table with a primary key, then the following query runs successfully:
INSERT INTO t1 SELECT t1.* FROM t1 JOIN t2 ON t1.b = t2.b ON DUPLICATE KEY UPDATE t1.b.
For more details, refer to the UPDATE topic.
- Turning off multi-statement transactions makes per-partition components of a distributed transaction commit independently without waiting for other partitions to finish and potentially trigger rollback. This removes the common row lock deadlock (transaction A waiting for transaction B on partition 1, but B waiting for A on partition 2) at the cost of potentially letting failed queries commit partial results.
- If the values inserted are dependent on multiple statements executing concurrently, then inconsistent data may get inserted.
- In case of failure, it is not possible to rollback the entire multi-upsert inside one transaction.