Group by a range of values with PostgreSQL

Published on Mar. 27, 2017 by Gabriel Bordeaux

Introduction

Here is a simple solution on how to group data within a range. As a basic example, you may have a list of team members with their birth year and you may want to know the average salary grouped by decade:

 r_from | r_to | team_members | salary_avg | salary_min | salary_max 
--------+------+--------------+------------+------------+------------
   1960 | 1969 |            2 |   48150.00 |      39400 |      56900
   1970 | 1979 |            4 |   36575.00 |      26800 |      44000
   1980 | 1989 |            4 |   43125.00 |      23000 |      87000
   1990 | 1999 |            3 |   45533.33 |      40000 |      55000
I'm an illustration

Grouped veggies

Try it yourself

If you'd like to try the above example for yourself, just create this table:

CREATE TEMP TABLE team (
   id serial,
   name text,
   birth_year integer,
   salary integer
);

INSERT INTO team (name, birth_year, salary)
VALUES ('Gabriel', 1970, 44000),
       ('Tom', 1972, 36000),
       ('Bill', 1978, 39500),
       ('Bob', 1980, 29000),
       ('Roger', 1976, 26800),
       ('Lucas', 1965, 56900),
       ('Jerome', 1984, 33500),
       ('Andrew', 1992, 41600),
       ('John', 1991, 40000),
       ('Paul', 1964, 39400),
       ('Richard', 1986, 23000),
       ('Joseph', 1988, 87000),
       ('Jason', 1990, 55000);

Create a range with generate_series

A simple solution would be to create a range with generate series:

gab@test> WITH series AS (
                SELECT generate_series(1950, 2010, 10) AS r_from -- 1950 = min, 2010 = max, 10 = 10 year interval
), range AS (
                SELECT r_from, (r_from + 9) AS r_to FROM series -- 9 = interval (10 years) minus 1
)
SELECT r_from, r_to
FROM range;

 r_from | r_to 
--------+------
   1950 | 1959
   1960 | 1969
   1970 | 1979
   1980 | 1989
   1990 | 1999
   2000 | 2009
   2010 | 2019
(7 rows)

Group data by a range of values

You can use the example above to group data within a range:

gab@test> WITH series AS (
                SELECT generate_series(1950, 2010, 10) AS r_from -- 1950 = min, 2010 = max, 10 = 10 year interval
), range AS (
                SELECT r_from, (r_from + 9) AS r_to FROM series -- 9 = interval (10 years) minus 1
)
SELECT r_from, r_to,
       (SELECT count(*) FROM team WHERE birth_year BETWEEN r_from AND r_to) as team_members,
       round((SELECT AVG(salary) FROM team WHERE birth_year BETWEEN r_from AND r_to), 2) as salary_avg,
       (SELECT MIN(salary) FROM team WHERE birth_year BETWEEN r_from AND r_to) as salary_min,
       (SELECT MAX(salary) FROM team WHERE birth_year BETWEEN r_from AND r_to) as salary_max
FROM range;

 r_from | r_to | team_members | salary_avg | salary_min | salary_max 
--------+------+--------------+------------+------------+------------
   1950 | 1959 |            0 |     {NULL} |     {NULL} |     {NULL}
   1960 | 1969 |            2 |   48150.00 |      39400 |      56900
   1970 | 1979 |            4 |   36575.00 |      26800 |      44000
   1980 | 1989 |            4 |   43125.00 |      23000 |      87000
   1990 | 1999 |            3 |   45533.33 |      40000 |      55000
   2000 | 2009 |            0 |     {NULL} |     {NULL} |     {NULL}
   2010 | 2019 |            0 |     {NULL} |     {NULL} |     {NULL}
(7 rows)