PostgreSQL foreign data wrapper: Aggregate Pushdown

Published on May. 17, 2018 by Gabriel Bordeaux

Foreign data wrapper Aggregate Pushdown

Introduction

PostgreSQL 10 implemented push aggregate functions to the remote server to improve foreign data wrapper performances (see release notes)

Without aggregate pushdown, when an aggregation function was used (COUNT, SUM, AVG...), all rows were forwarded to the foreign data wrapper which would aggregate them accordingly remotely.

PostgreSQL 10 added features to forward data already aggregated to the foreign data wrapper which dramatically improves performances.

Example

If you need more information on how to setup a foreign data wrapper, please read my dedicated article.

Sample table

We will create then access the following table with 2 foreign data wrappers, one with PostgreSQL 9.6 and the second with PostgreSQL 10:

-- Create table
CREATE TABLE remote_table (
    id serial primary key,
    num int,
    date timestamp with time zone
);

-- Add indexes
CREATE INDEX ON remote_table (num);
CREATE INDEX ON remote_table (date);

-- Populate with some data
INSERT INTO remote_table (num, date)
SELECT floor(random() * 10 + 1)::int, generate_series(NOW() - INTERVAL '1000 days', NOW(), '1 day'::interval);

-- Grant SELECT privileges to "remote_user"
GRANT SELECT ON ALL TABLES IN SCHEMA public TO remote_user;

PostgreSQL 9.6: no aggregate pushdown

postgres=# SELECT num, count(*)
FROM fdw.remote_table
GROUP BY num
ORDER BY count(*) DESC
LIMIT 5;
 num | count 
-----+-------
   5 |   117
   1 |   113
   7 |   111
   4 |   109
  10 |   101
(5 rows)

Time: 14.683 ms

Query explained:

postgres=# explain (verbose, costs off) SELECT num, count(*) FROM fdw.remote_table GROUP BY num ORDER BY count(*) DESC LIMIT 5;
                             QUERY PLAN                              
---------------------------------------------------------------------
 Limit
   Output: num, (count(*))
   ->  Sort
         Output: num, (count(*))
         Sort Key: (count(*)) DESC
         ->  HashAggregate
               Output: num, count(*)
               Group Key: remote_table.num
               ->  Foreign Scan on fdw.remote_table
                     Output: id, num, date
                     Remote SQL: SELECT num FROM public.remote_table
(11 rows)

Time: 1.507 ms

PostgreSQL 10: with aggregate pushdown

postgres=# SELECT num, count(*)
FROM fdw.remote_table
GROUP BY num
ORDER BY count(*) DESC
LIMIT 5;
 num | count 
-----+-------
   1 |   107
   7 |   105
  10 |   104
   5 |   103
   2 |   101
(5 rows)

Time: 12.051 ms

Query explained:

postgres=# explain (verbose, costs off) SELECT num, count(*)
FROM fdw.remote_table
GROUP BY num
ORDER BY count(*) DESC
LIMIT 5;
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Limit
   Output: num, (count(*))
   ->  Sort
         Output: num, (count(*))
         Sort Key: (count(*)) DESC
         ->  Foreign Scan
               Output: num, (count(*))
               Relations: Aggregate on (fdw.remote_table)
               Remote SQL: SELECT num, count(*) FROM public.remote_table GROUP BY 1
(9 rows)

Time: 0.790 ms

Understanding where the aggregation happens

With explain (verbose, costs off), the query plan will show the Remote SQL query.

With PostgreSQL 9.6, the query executed on the remote server is:

SELECT num FROM public.remote_table
-- 1001 rows returned

and with PostgreSQL 10, it is:

SELECT num, count(*) FROM public.remote_table GROUP BY 1
-- 10 rows returned

It means that all rows matching the WHERE condition (all rows from the table in this case) are forwarded to the foreign data wrapper with PostgreSQL 9.6 which will then aggregate them locally. On PostgreSQL 10, rows are aggregated on the remote server and are forwarded pre-aggregated to the foreign data wrapper.

When the aggregation happens and when it does not

Pushing aggregate functions to the remote server will improve in the future and does not cover all queries.

As shown below, aggregation does not happen with a date function (like NOW()). This is because dates are relative to the foreign data wrapper and are not immutables.

Aggregation does not happen with a date function:

SELECT num, count(*) FROM fdw.remote_table WHERE date > NOW() - INTERVAL '1 year' GROUP BY num ORDER BY count(*) DESC LIMIT 1;

-- Remote SQL:
SELECT num, date FROM public.remote_table

Even if the time zone is specified:

SELECT num, count(*) FROM fdw.remote_table WHERE date > (NOW() - INTERVAL '1 year') AT TIME ZONE 'US/Eastern' GROUP BY num ORDER BY count(*) DESC LIMIT 1;

-- Remote SQL:
SELECT num, date FROM public.remote_table

But it works with a string date:

SELECT num, count(*) FROM fdw.remote_table WHERE date > '2017-05-07' GROUP BY num ORDER BY count(*) DESC LIMIT 1;

-- Remote SQL:
SELECT num, count(*) FROM public.remote_table WHERE ((date > '2017-05-07 00:00:00+00'::timestamp with time zone)) GROUP BY 1

Since multiple elements are taken in consideration by the foreign data wrapper when deciding wether or not to use pushdown aggregation, I strongly recommend to use explain (verbose, costs off) with all queries to understand the query plan and the remote query.