PostgreSQL: querying Bigquery using bigquery_fdw foreign data wrapper

Published on May. 24, 2018 by Gabriel Bordeaux

Introduction

This article is a step by step how-to to query BigQuery from a PostgreSQL instance using the foreign data wrapper bigquery_fdw.

Pre-requisites:

  • PostgreSQL >= 9.5 installed
  • You are using a debian based server

The examples below are based on a Debian 9.4 instance with PostgreSQL 10.4.

Install dependencies

# Install required packages
apt-get update
apt-get install --yes postgresql-server-dev-10 python3-setuptools python3-dev make gcc git

# Install pip3
easy_install3 pip

For PostgresSQL 9.X, install postgresql-server-dev-9.X instead of postgresql-server-dev-10.

Install Multicorn and bigquery_fdw

# Install `setuptools` if necessary
pip3 install --upgrade setuptools

# Install Multicorn
git clone git://github.com/Kozea/Multicorn.git && cd Multicorn
export PYTHON_OVERRIDE=python3
make && make install

# Install bigquery_fdw
pip3 install bigquery-fdw

Create a BigQuery service account

  • Go to IAM & admin > Service accounts in the Google cloud console
  • Click on "CREATE SERVICE ACCOUNT"
  • Choose a "Service account name"
  • In "Roles", select "BigQuey > BigQuery User" and "BigQuey > BigQuery Data Viewer"
  • Click "Furnish a new private key" and ensure that JSON is selected
  • Click "CREATE"

The private key should be downloaded.

Store the private key

# Create directory
mkdir -p /opt/bigquery_fdw

# Copy key in directory
cp /current/path/to/key.json /opt/bigquery_fdw/key.json

# Set correct permissions
chown -R postgres.postgres /opt/bigquery_fdw
chmod -R 700 /opt/bigquery_fdw

Test the connectivity

Before creating a foreign table, you can use the bq_client_test tool to test the connectivity to BigQuery from your server with the Json key.

You need to replace my_dataset.my_table with an existing dataset and table name.

bq_client_test --key /opt/bigquery_fdw/key.json \
               --query "SELECT count(*) FROM my_dataset.my_table"

 * BigQuery client instance:
<google.cloud.bigquery.client.Client object at 0x7fa8f30dd278>
 * Query instance:
<google.cloud.bigquery.job.QueryJob object at 0x7fa8f30dd128>
 * Query results:
Row((155756,), {'f0_': 0})

Create the extension and server in PostgreSQL

Login to PostgreSQL:

su postgres
psql

Create the extension and the server:

CREATE EXTENSION multicorn;

CREATE SERVER bigquery_srv FOREIGN DATA WRAPPER multicorn
OPTIONS (
    wrapper 'bigquery_fdw.fdw.ConstantForeignDataWrapper'
);

Create a table matching a foreign table

Create a table that matches BigQuery datatypes with PostgreSQL equivalent types (see conversion table).

CREATE FOREIGN TABLE my_bigquery_table (
    column1 text,
    column2 bigint
) SERVER bigquery_srv
OPTIONS (
    fdw_dataset  'my_dataset',
    fdw_table 'my_table',
    fdw_key '/opt/bigquery_fdw/user.json'
);

You can now run queries from PostgreSQL to your BigQuery table:

postgres=# SELECT count(*) FROM my_bigquery_table;
 count  
--------
 155756
(1 row)

postgres=# SELECT count(*) FROM my_bigquery_table WHERE some_column = 'some_value';
 count 
-------
 22560
(1 row)

Try it yourself

Try it yourself with this how-to and the documentation available at https://github.com/gabfl/bigquery_fdw.