Product Solutions Support
Try MemSQL

PROFILE

Provides detailed resources usage metrics about a query.

PROFILE select_statement

When a PROFILE statement is executed, the associated SELECT statement is also executed. During execution of both queries, resource usage metrics are collected about the SELECT statement. After the queries have executed, the collected metrics can be displayed by executing the SHOW PROFILE statement.

Like the EXPLAIN statement, PROFILE displays query execution operations as a tree, where the operator at the top is the last executed before returning the result to the client, and operators below are executed before their parents. However, PROFILE also appends resource usage metrics to the end of each line in the execution tree.

The following metrics are gathered about each when executing a statement using PROFILE:

Not all of the above metrics will be gathered for queries where execution performance may be hindered by gathering such metrics.

Profile Example

The PROFILE statement is particularly helpful when evaluating distributed query performance, such as a distributed join. Distributed joins that require broadcasts or repartitions are expensive, and the PROFILE statement can help you understand how such queries are executed so that they can be optimized.

In the following example, a distributed join is executed against a four node cluster to return data about customers and their orders. Before executing a PROFILE statement, it’s best to first execute an EXPLAIN statement to understand the execution path for a query. EXPLAIN may also inform you that you should first run an ANALYZE statement to ensure the database has the most up-to-date statistics on its data.

memsql> EXPLAIN SELECT COUNT(*) FROM orders o INNER JOIN customer c ON o.custkey = c.custkey;
+--------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN                                                                                                                              |
+--------------------------------------------------------------------------------------------------------------------------------------+
| Project [CAST(COALESCE($0,0) AS SIGNED) AS `count(*)`]                                                                               |
| Aggregate [SUM(remote_0.`count(*)`) AS $0]                                                                                           |
| Gather partitions:all est_rows:1 alias:remote_0                                                                                      |
| Project [`count(*)`] est_rows:1 est_select_cost:3308565                                                                              |
| Aggregate [COUNT(*) AS `count(*)`]                                                                                                   |
| NestedLoopJoin                                                                                                                       |
| |---IndexSeek memsql_demo.customer AS c, PRIMARY KEY (custkey) scan:[custkey = r0.custkey] est_table_rows:150000 est_filtered:150000 |
| TableScan r0 storage:list stream:no                                                                                                  |
| Repartition [o.custkey] AS r0 shard_key:[custkey] est_rows:1102855                                                                   |
| TableScan memsql_demo.orders AS o, PRIMARY KEY (orderkey) est_table_rows:1102855 est_filtered:1102855                                |
+--------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)

From this EXPLAIN statement, it’s clear that a repartition is required, and that a nested loop join will be performed. Now you can run the PROFILE statement with the same SELECT query to gather resource usage metrics.

memsql> PROFILE SELECT COUNT(*) FROM orders o INNER JOIN customer c ON o.custkey = c.custkey;
+----------+
| count(*) |
+----------+
| 1102855  |
+----------+
1 row in set (0.17 sec)

The PROFILE statement will output the same results as the inner SELECT statement, but it has also gathered resource usage metrics. To see the metrics, you must run the SHOW PROFILE statement.

memsql> SHOW PROFILE;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| PROFILE                                                                                                                                                   |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Project [CAST(COALESCE($0,0) AS SIGNED) AS `count(*)`] actual_rows: 1 network_traffic: 0.008000 KB                                                        |
| Aggregate [SUM(remote_0.`count(*)`) AS $0] actual_rows: 32                                                                                                |
| Gather partitions:all est_rows:1 alias:remote_0 actual_rows: 32 exec_time: 71ms                                                                           |
| Project [`count(*)`] est_rows:1 est_select_cost:3308565 actual_rows: 32 network_traffic: 0.192000 KB                                                      |
| Aggregate [COUNT(*) AS `count(*)`] actual_rows: 1102855                                                                                                   |
| NestedLoopJoin actual_rows: 1102855                                                                                                                       |
| |---IndexSeek memsql_demo.customer AS c, PRIMARY KEY (custkey) scan:[custkey = r0.custkey] est_table_rows:150000 est_filtered:150000 actual_rows: 1102855 |
| TableScan r0 storage:list stream:no actual_rows: 1102855 exec_time: 70ms                                                                                  |
| Repartition [o.custkey] AS r0 shard_key:[custkey] est_rows:1102855 actual_rows: 1102855 network_traffic: 6904.042969 KB                                   |
| TableScan memsql_demo.orders AS o, PRIMARY KEY (orderkey) est_table_rows:1102855 est_filtered:1102855 actual_rows: 1102855 exec_time: 57ms                |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------+
10 rows in set (0.00 sec)

After the SHOW PROFILE statement has been executed, you can see from the example above that actual rows, network traffic, and execution time have been appended to the end of the execution tree lines. The Repartition operation was a particularly expensive operation, requiring nearly 7MB of network traffic.

Was this article useful?