SQL - SQL Server - Cumulative Distribution Functions¶
Source: sql-snippets/Cumulative distribution functions.md at main · count/sql-snippets (github.com)
Explore this snippet with some demo data here.
Description¶
Cumulative Distribution functions (CDFs) are a method for analyzing the statistical distribution of a quantity, similar to histograms.
They show, for each value of a quantity, what fraction of rows are smaller or greater.
One method for calculating a CDF is as follows:
/*
Use a row_number window function to get the position of this row
and CAST to convert row_number and count from integers to decimal
*/
SELECT CAST(row_number() OVER (ORDER BY <quantity> ASC) AS DECIMAL) / CAST((SELECT COUNT(*) FROM <table> ) AS DECIMAL ) AS cdf
FROM <table>
where:
quantity
- the column containing the metric of interesttable
- the table name
Note: CAST
is used to convert the numerator and denominator of the fraction into decimals before they are divided.
Example¶
Using some student test scores as an example data source, let’s identify:
table
- this is calledExample_data
quantity
- this is thecolumn score
then the query becomes:
SELECT student_id,
score,
CAST(ROW_NUMBER() OVER (ORDER BY score ASC) AS DECIMAL) / CAST(( SELECT COUNT(*) FROM Example_data) AS DECIMAL) AS frac
FROM Example_data
Output:
student_id | score | frac |
---|---|---|
3 | 76 | 0.2 |
5 | 77 | 0.2 |
4 | 82 | 0.6 |
… | … | … |
1 | 97 | 1 |
Appendix: Links¶
Backlinks:
list from [[SQL - SQL Server - Cumulative Distribution Functions]] AND -"Changelog"