PERCENTILE_CONT and Median

In MemSQL, percentile functions are only available as window functions. There are two percentile functions: PERCENTILE_CONT and PERCENTILE_DISC. PERCENTILE_CONT is an inverse distribution function. It assumes a continuous distribution between values of the expression in the sort specification. Then, it interpolates the value of that expression at the given percentile, performing a linear interpolation. For an overview of window functions in MemSQL, see Window Functions Guide.

Syntax

PERCENTILE_CONT(percentile) 
WITHIN GROUP (ORDER BY (col | expr)) 
OVER (
    [PARTITION BY (col | expr), ...]
    [frame_clause]
)

or equivalently

PERCENTILE_CONT(percentile) 
OVER (
        ORDER BY (col | expr)
        [PARTITION BY {col | expr}, ...]
        [frame_clause]
)

Arguments

percentile

A numeric value between 0 and 1.

col

A column of a numeric data type.

expr

An expression that evaluates to a numeric data type.

Remarks

The OVER() clause defines how to partition the input set. If no PARTITION BY clause is specified, the entire input set is treated as a single partition.

Only one column or expression is allowed in the ORDER BY clause.

Example

The following example creates a table containing SAT scores. The resulting window frame shows the hypothetical 90th percentile SAT score given the existing scores.

memsql> CREATE TABLE SAT (math INT, reading INT, writing INT);

memsql> INSERT INTO SAT VALUES (800,800,800),(650,300,500),(700,500,520);

memsql> SELECT math, reading, writing,
    -> math + reading + writing AS total,
    -> PERCENTILE_CONT(.9)
    -> WITHIN GROUP (ORDER BY math + reading + writing)
    -> OVER () AS `90th Percentile` FROM SAT;
+------+---------+---------+-------+-----------------+
| math | reading | writing | total | 90th Percentile |
+------+---------+---------+-------+-----------------+
|  650 |     300 |     500 |  1450 |       2264.0000 |
|  700 |     500 |     520 |  1720 |       2264.0000 |
|  800 |     800 |     800 |  2400 |       2264.0000 |
+------+---------+---------+-------+-----------------+

Median

Compute the median using percentile = 0.5.

PERCENTILE_CONT( 0.5 ) ...

Example:
memsql> create table t (a int);
memsql> insert into t values (1),(2),(3),(166),(1024);

memsql> select a, avg(a) over () as mean, percentile_cont(.5) within group (order by a) over () as median from t;
+------+----------+--------+
| a    | mean     | median |
+------+----------+--------+
|    1 | 239.2000 | 3.0000 |
|    2 | 239.2000 | 3.0000 |
|    3 | 239.2000 | 3.0000 |
|  166 | 239.2000 | 3.0000 |
| 1024 | 239.2000 | 3.0000 |
+------+----------+--------+
Was this article useful?