APPROX_COUNT_DISTINCT

An aggregate function. Returns an estimate of the number of distinct values in a collection.

APPROX_COUNT_DISTINCT uses the HyperLogLog algorithm with fourteen registers. Compared to SELECT COUNT(DISTINCT <column>), which calculates the exact number of distinct values in a column of table, APPROX_COUNT_DISTINCT can run much faster and consume significantly less memory.

Syntax

APPROX_COUNT_DISTINCT ( expr )

Arguments

expr

An expression evaluating to a scalar value.

expr is typically the name of a column. In this case, APPROX_COUNT_DISTINCT returns an estimate of the number of distinct values in the column.

Examples

The following example populates a column with one-hundred thousand unique integers and runs SELECT COUNT(DISTINCT <column>) and APPROX_COUNT_DISTINCT on the column so you can compare the results.

DROP DATABASE IF EXISTS memsql_docs_example;
CREATE DATABASE memsql_docs_example;
USE memsql_docs_example;

DELIMITER //
CREATE PROCEDURE populate_table() AS
BEGIN
  CREATE TABLE t(a INT);
  FOR i IN 1..100000 LOOP
    INSERT INTO t VALUES(i);
  END LOOP;
END
//
DELIMITER ;

CALL populate_table();

Get the exact count of distinct values in the column a:

SELECT COUNT(DISTINCT a) FROM t;

Output:

+--------------------+
| COUNT(DISTINCT a)  |
+--------------------+
|             100000 |
+--------------------+

Get an estimate of the number of distinct values in the column a:

SELECT APPROX_COUNT_DISTINCT(a) FROM t;

Output:

+--------------------------+
| APPROX_COUNT_DISTINCT(a) |
+--------------------------+
|                   100498 |
+--------------------------+
Was this article useful?