pg_clickhouse Reference Documentation

Description

pg_clickhouse is a PostgreSQL extension that enables remote query execution on ClickHouse databases, including a foreign data wrapper. It supports PostgreSQL 13 and higher and ClickHouse 23 and higher.

Getting Started

The simplest way to try pg_clickhouse is the Docker image, which contains the standard PostgreSQL Docker image with the pg_clickhouse extension:

docker run --name pg_clickhouse -e POSTGRES_PASSWORD=my_pass \
       -d ghcr.io/clickhouse/pg_clickhouse:18
docker exec -it pg_clickhouse psql -U postgres

See the tutorial to get started importing ClickHouse tables and pushing down queries.

Usage

CREATE EXTENSION pg_clickhouse;
CREATE SERVER taxi_srv FOREIGN DATA WRAPPER clickhouse_fdw
       OPTIONS(driver 'binary', host 'localhost', dbname 'taxi');
CREATE USER MAPPING FOR CURRENT_USER SERVER taxi_srv
       OPTIONS (user 'default');
CREATE SCHEMA taxi;
IMPORT FOREIGN SCHEMA taxi FROM SERVER taxi_srv INTO taxi;

Versioning Policy

pg_clickhouse adheres to Semantic Versioning for its public releases.

The major version increments for API changes
The minor version increments for backward compatible SQL changes
The patch version increments for binary-only changes

Once installed, PostgreSQL tracks two variations of the version:

The library version (defined by PG_MODULE_MAGIC on PostgreSQL 18 and higher) includes the full semantic version, visible in the output of the pg_get_loaded_modules() function.
The extension version (defined in the control file) includes only the major and minor versions, visible in the pg_catalog.pg_extension table, the output of the pg_available_extension_versions() function, and \dx pg_clickhouse.

In practice this means that a release that increments the patch version, e.g. from v0.1.0 to v0.1.1, benefits all databases that have loaded v0.1 and don't need to run ALTER EXTENSION to benefit from the upgrade.

A release that increments the minor or major versions, on the other hand, will be accompanied by SQL upgrade scripts, and all existing database that contain the extension must run ALTER EXTENSION pg_clickhouse UPDATE to benefit from the upgrade.

DDL SQL Reference

The following SQL DDL expressions use pg_clickhouse.

CREATE EXTENSION

Use CREATE EXTENSION to add pg_clickhouse to a database:

CREATE EXTENSION pg_clickhouse;

Use WITH SCHEMA to install it into a specific schema (recommended):

CREATE SCHEMA ch;
CREATE EXTENSION pg_clickhouse WITH SCHEMA ch;

ALTER EXTENSION

Use ALTER EXTENSION to change pg_clickhouse. Examples:

After installing a new release of pg_clickhouse, use the UPDATE clause:
```
ALTER EXTENSION pg_clickhouse UPDATE;
```

Use SET SCHEMA to move the extension to a new schema:

CREATE SCHEMA ch;
ALTER EXTENSION pg_clickhouse SET SCHEMA ch;

DROP EXTENSION

Use DROP EXTENSION to remove pg_clickhouse from a database:

DROP EXTENSION pg_clickhouse;

This command fails if there are any objects that depend on pg_clickhouse. Use the CASCADE clause to drop them, too:

DROP EXTENSION pg_clickhouse CASCADE;

CREATE SERVER

Use CREATE SERVER to create a foreign server that connects to a ClickHouse server. Example:

CREATE SERVER taxi_srv FOREIGN DATA WRAPPER clickhouse_fdw
       OPTIONS(driver 'binary', host 'localhost', dbname 'taxi');

The supported options are:

driver: The ClickHouse connection driver to use, either "binary" or "http". Required.
dbname: The ClickHouse database to use upon connecting. Defaults to "default".
host: The host name of the ClickHouse server. Defaults to "localhost";
port: The port to connect to on the ClickHouse server. Defaults as follows:
- 9440 if driver is "binary" and host is a ClickHouse Cloud host
- 9004 if driver is "binary" and host isn't a ClickHouse Cloud host
- 8443 if driver is "http" and host is a ClickHouse Cloud host
- 8123 if driver is "http" and host isn't a ClickHouse Cloud host

ALTER SERVER

Use ALTER SERVER to change a foreign server. Example:

ALTER SERVER taxi_srv OPTIONS (SET driver 'http');

The options are the same as for CREATE SERVER.

DROP SERVER

Use DROP SERVER to remove a foreign server:

DROP SERVER taxi_srv;

This command fails if any other objects depend on the server. Use CASCADE to also drop those dependencies:

DROP SERVER taxi_srv CASCADE;

CREATE USER MAPPING

Use CREATE USER MAPPING to map a PostgreSQL user to a ClickHouse user. For example, to map the current PostgreSQL user to the remote ClickHouse user when connecting with the taxi_srv foreign server:

CREATE USER MAPPING FOR CURRENT_USER SERVER taxi_srv
       OPTIONS (user 'demo');

The The supported options are:

user: The name of the ClickHouse user. Defaults to "default".
password: The password of the ClickHouse user.

ALTER USER MAPPING

Use ALTER USER MAPPING to change the definition of a user mapping:

ALTER USER MAPPING FOR CURRENT_USER SERVER taxi_srv
       OPTIONS (SET user 'default');

The options are the same as for CREATE USER MAPPING.

DROP USER MAPPING

Use DROP USER MAPPING to remove a user mapping:

DROP USER MAPPING FOR CURRENT_USER SERVER taxi_srv;

IMPORT FOREIGN SCHEMA

Use IMPORT FOREIGN SCHEMA to import all the tables defines in a ClickHouse database as foreign tables into a PostgreSQL schema:

CREATE SCHEMA taxi;
IMPORT FOREIGN SCHEMA demo FROM SERVER taxi_srv INTO taxi;

Use LIMIT TO to limit the import to specific tables:

IMPORT FOREIGN SCHEMA demo LIMIT TO (trips) FROM SERVER taxi_srv INTO taxi;

Use EXCEPT to exclude tables:

IMPORT FOREIGN SCHEMA demo EXCEPT (users) FROM SERVER taxi_srv INTO taxi;

pg_clickhouse will fetch a list of all the tables in the specified ClickHouse database ("demo" in the above examples), fetch column definitions for each, and execute CREATE FOREIGN TABLE commands to create the foreign tables. Columns will be defined using the supported data types and, were detectable, the options supported by CREATE FOREIGN TABLE.

Imported Identifier Case Preservation

IMPORT FOREIGN SCHEMA runs quote_identifier() on the table and column names it imports, which double-quotes identifiers with uppercase characters or blank spaces. Such table and column names thus must be double-quoted in PostgreSQL queries. Names with all lowercase and no blank space characters don't need to be quoted.

For example, given this ClickHouse table:

CREATE OR REPLACE TABLE test
(
    id UInt64,
    Name TEXT,
    updatedAt DateTime DEFAULT now()
)
ENGINE = MergeTree
ORDER BY id;

IMPORT FOREIGN SCHEMA creates this foreign table:

CREATE TABLE test
(
    id          BIGINT      NOT NULL,
    "Name"      TEXT        NOT NULL,
    "updatedAt" TIMESTAMPTZ NOT NULL
);

Queries therefore must quote appropriately, e.g.,

SELECT id, "Name", "updatedAt" FROM test;

To create objects with different names or all lowercase (and therefore case-insensitive) names, use CREATE FOREIGN TABLE.

CREATE FOREIGN TABLE

Use CREATE FOREIGN TABLE to create a foreign table that can query data from a ClickHouse database:

CREATE FOREIGN TABLE uact (
    user_id    bigint NOT NULL,
    page_views int,
    duration   smallint,
    sign       smallint
) SERVER taxi_srv OPTIONS(
    table_name 'uact'
    engine 'CollapsingMergeTree'
);

The supported table options are:

database: The name of the remote database. Defaults to the database defined for the foreign server.
table_name: The name of the remote table. Default to the name specified for the foreign table.
engine: The table engine used by the ClickHouse table. For CollapsingMergeTree() and AggregatingMergeTree(), pg_clickhouse automatically applies the parameters to function expressions executed on the table.

Use the data type appropriate for the remote ClickHouse data type of each column. For AggregateFunction Type and SimpleAggregateFunction Type columns, map the data type to the ClickHouse type passed to the function and specify the name of the aggregate function via the appropriate column option:

AggregateFunction: The name of the aggregate function applied to an AggregateFunction Type column
SimpleAggregateFunction: The name of the aggregate function applied to an SimpleAggregateFunction Type column

Example:

(aggregatefunction 'sum')

CREATE FOREIGN TABLE test (
    column1 bigint  OPTIONS(AggregateFunction 'uniq'),
    column2 integer OPTIONS(AggregateFunction 'anyIf'),
    column3 bigint  OPTIONS(AggregateFunction 'quantiles(0.5, 0.9)')
) SERVER clickhouse_srv;

For columns with the AggregateFunction function, pg_clickhouse will automatically append Merge to an aggregate function evaluating the column.

ALTER FOREIGN TABLE

Use ALTER FOREIGN TABLE to change the definition of a foreign table:

ALTER TABLE table ALTER COLUMN b OPTIONS (SET AggregateFunction 'count');

The supported table and column options are the same as for CREATE FOREIGN TABLE.

DROP FOREIGN TABLE

Use DROP FOREIGN TABLE to remove a foreign table:

DROP FOREIGN TABLE uact;

This command fails if there are any objects that depend on the foreign table. Use the CASCADE clause to drop them, too:

DROP FOREIGN TABLE uact CASCADE;

DML SQL Reference

The SQL DML expressions below may use pg_clickhouse. Examples depend on these ClickHouse tables:

CREATE TABLE logs (
    req_id    Int64 NOT NULL,
    start_at   DateTime64(6, 'UTC') NOT NULL,
    duration  Int32 NOT NULL,
    resource  Text  NOT NULL,
    method    Enum8('GET' = 1, 'HEAD', 'POST', 'PUT', 'DELETE', 'CONNECT', 'OPTIONS', 'TRACE', 'PATCH', 'QUERY') NOT NULL,
    node_id   Int64 NOT NULL,
    response  Int32 NOT NULL
) ENGINE = MergeTree
  ORDER BY start_at;

CREATE TABLE nodes (
    node_id Int64 NOT NULL,
    name    Text  NOT NULL,
    region  Text  NOT NULL,
    arch    Text  NOT NULL,
    os      Text  NOT NULL
) ENGINE = MergeTree
  PRIMARY KEY node_id;

EXPLAIN

The EXPLAIN command works as expected, but the VERBOSE option triggers the ClickHouse "Remote SQL" query to be emitted:

try=# EXPLAIN (VERBOSE)
       SELECT resource, avg(duration) AS average_duration
         FROM logs
        GROUP BY resource;
                                     QUERY PLAN
------------------------------------------------------------------------------------
 Foreign Scan  (cost=1.00..5.10 rows=1000 width=64)
   Output: resource, (avg(duration))
   Relations: Aggregate on (logs)
   Remote SQL: SELECT resource, avg(duration) FROM "default".logs GROUP BY resource
(4 rows)

This query pushes down to ClickHouse via a "Foreign Scan" plan node, the remote SQL.

SELECT

Use the SELECT statement to execute queries on pg_clickhouse tables just like any other tables:

try=# SELECT start_at, duration, resource FROM logs WHERE req_id = 4117909262;
          start_at          | duration |    resource
----------------------------+----------+----------------
 2025-12-05 15:07:32.944188 |      175 | /widgets/totam
(1 row)

pg_clickhouse works to push query execution down to ClickHouse as much as possible, including aggregate functions. Use EXPLAIN to determine the pushdown extent. For the above query, for example, all execution is pushed down to ClickHouse

try=# EXPLAIN (VERBOSE, COSTS OFF)
       SELECT start_at, duration, resource FROM logs WHERE req_id = 4117909262;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Foreign Scan on public.logs
   Output: start_at, duration, resource
   Remote SQL: SELECT start_at, duration, resource FROM "default".logs WHERE ((req_id = 4117909262))
(3 rows)

pg_clickhouse also pushes down JOINs to tables that are from the same remote server:

try=# EXPLAIN (ANALYZE, VERBOSE)
       SELECT name, count(*), round(avg(duration))
         FROM logs
         LEFT JOIN nodes on logs.node_id = nodes.node_id
        GROUP BY name;
                                                                                  QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=1.00..5.10 rows=1000 width=72) (actual time=3.201..3.221 rows=8.00 loops=1)
   Output: nodes.name, (count(*)), (round(avg(logs.duration), 0))
   Relations: Aggregate on ((logs) LEFT JOIN (nodes))
   Remote SQL: SELECT r2.name, count(*), round(avg(r1.duration), 0) FROM  "default".logs r1 ALL LEFT JOIN "default".nodes r2 ON (((r1.node_id = r2.node_id))) GROUP BY r2.name
   FDW Time: 0.086 ms
 Planning Time: 0.335 ms
 Execution Time: 3.261 ms
(7 rows)

Joining with a local table will generate less efficient queries without careful tuning. In this example, we make a local copy of the nodes table and join to it instead of the remote table:

try=# CREATE TABLE local_nodes AS SELECT * FROM nodes;
SELECT 8

try=# EXPLAIN (ANALYZE, VERBOSE)
       SELECT name, count(*), round(avg(duration))
         FROM logs
         LEFT JOIN local_nodes on logs.node_id = local_nodes.node_id
        GROUP BY name;
                                                             QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=147.65..150.65 rows=200 width=72) (actual time=6.215..6.235 rows=8.00 loops=1)
   Output: local_nodes.name, count(*), round(avg(logs.duration), 0)
   Group Key: local_nodes.name
   Batches: 1  Memory Usage: 32kB
   Buffers: shared hit=1
   ->  Hash Left Join  (cost=31.02..129.28 rows=2450 width=36) (actual time=2.202..5.125 rows=1000.00 loops=1)
         Output: local_nodes.name, logs.duration
         Hash Cond: (logs.node_id = local_nodes.node_id)
         Buffers: shared hit=1
         ->  Foreign Scan on public.logs  (cost=10.00..20.00 rows=1000 width=12) (actual time=2.089..3.779 rows=1000.00 loops=1)
               Output: logs.req_id, logs.start_at, logs.duration, logs.resource, logs.method, logs.node_id, logs.response
               Remote SQL: SELECT duration, node_id FROM "default".logs
               FDW Time: 1.447 ms
         ->  Hash  (cost=14.90..14.90 rows=490 width=40) (actual time=0.090..0.091 rows=8.00 loops=1)
               Output: local_nodes.name, local_nodes.node_id
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               Buffers: shared hit=1
               ->  Seq Scan on public.local_nodes  (cost=0.00..14.90 rows=490 width=40) (actual time=0.069..0.073 rows=8.00 loops=1)
                     Output: local_nodes.name, local_nodes.node_id
                     Buffers: shared hit=1
 Planning:
   Buffers: shared hit=14
 Planning Time: 0.551 ms
 Execution Time: 6.589 ms

In this case, we can push more of the aggregation down to ClickHouse by grouping on node_id instead of the local column, and then join to the lookup table later:

try=# EXPLAIN (ANALYZE, VERBOSE)
       WITH remote AS (
           SELECT node_id, count(*), round(avg(duration))
             FROM logs
            GROUP BY node_id
       )
       SELECT name, remote.count, remote.round
         FROM remote
         JOIN local_nodes
           ON remote.node_id = local_nodes.node_id
        ORDER BY name;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=65.68..66.91 rows=490 width=72) (actual time=4.480..4.484 rows=8.00 loops=1)
   Output: local_nodes.name, remote.count, remote.round
   Sort Key: local_nodes.name
   Sort Method: quicksort  Memory: 25kB
   Buffers: shared hit=4
   ->  Hash Join  (cost=27.60..43.79 rows=490 width=72) (actual time=4.406..4.422 rows=8.00 loops=1)
         Output: local_nodes.name, remote.count, remote.round
         Inner Unique: true
         Hash Cond: (local_nodes.node_id = remote.node_id)
         Buffers: shared hit=1
         ->  Seq Scan on public.local_nodes  (cost=0.00..14.90 rows=490 width=40) (actual time=0.010..0.016 rows=8.00 loops=1)
               Output: local_nodes.node_id, local_nodes.name, local_nodes.region, local_nodes.arch, local_nodes.os
               Buffers: shared hit=1
         ->  Hash  (cost=15.10..15.10 rows=1000 width=48) (actual time=4.379..4.381 rows=8.00 loops=1)
               Output: remote.count, remote.round, remote.node_id
               Buckets: 1024  Batches: 1  Memory Usage: 9kB
               ->  Subquery Scan on remote  (cost=1.00..15.10 rows=1000 width=48) (actual time=4.337..4.360 rows=8.00 loops=1)
                     Output: remote.count, remote.round, remote.node_id
                     ->  Foreign Scan  (cost=1.00..5.10 rows=1000 width=48) (actual time=4.330..4.349 rows=8.00 loops=1)
                           Output: logs.node_id, (count(*)), (round(avg(logs.duration), 0))
                           Relations: Aggregate on (logs)
                           Remote SQL: SELECT node_id, count(*), round(avg(duration), 0) FROM "default".logs GROUP BY node_id
                           FDW Time: 0.055 ms
 Planning:
   Buffers: shared hit=5
 Planning Time: 0.319 ms
 Execution Time: 4.562 ms

The "Foreign Scan" node now pushes down aggregation by node_id, reducing the number of rows that must be pulled back into Postgres from 1000 (all of them) to just 8, one for each node.

PREPARE, EXECUTE, DEALLOCATE

As of v0.1.2, pg_clickhouse supports parameterized queries, mainly created by the PREPARE command:

try=# PREPARE avg_durations_between_dates(date, date) AS
       SELECT date(start_at), round(avg(duration)) AS average_duration
         FROM logs
        WHERE date(start_at) BETWEEN $1 AND $2
        GROUP BY date(start_at)
        ORDER BY date(start_at);
PREPARE

Use EXECUTE as usual to execute a prepared statement:

try=# EXECUTE avg_durations_between_dates('2025-12-09', '2025-12-13');
    date    | average_duration
------------+------------------
 2025-12-09 |              190
 2025-12-10 |              194
 2025-12-11 |              197
 2025-12-12 |              190
 2025-12-13 |              195
(5 rows)

Note

Parameterized execution prevents the http driver from properly converting DateTime time zones on ClickHouse versions prior to 25.8, when the underlying bug was fixed. Note that sometimes PostgreSQL will use a parameterized query plan even without using PREPARE. For any queries on that require accurate time zone conversion, and where upgrading to 25.8 or later is not an option, use the binary driver, instead.

pg_clickhouse pushes down the aggregations, as usual, as seen in the EXPLAIN verbose output:

try=# EXPLAIN (VERBOSE) EXECUTE avg_durations_between_dates('2025-12-09', '2025-12-13');
                                                                                                            QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=1.00..5.10 rows=1000 width=36)
   Output: (date(start_at)), (round(avg(duration), 0))
   Relations: Aggregate on (logs)
   Remote SQL: SELECT date(start_at), round(avg(duration), 0) FROM "default".logs WHERE ((date(start_at) >= '2025-12-09')) AND ((date(start_at) <= '2025-12-13')) GROUP BY (date(start_at)) ORDER BY date(start_at) ASC NULLS LAST
(4 rows)

Note that it has sent the full date values, not the parameter placeholders. This holds for the first five requests, as described in the PostgreSQL PREPARE notes. On the sixth execution, it sends ClickHouse {param:type}-style query parameters: parameters:

                                                                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan  (cost=1.00..5.10 rows=1000 width=36)
   Output: (date(start_at)), (round(avg(duration), 0))
   Relations: Aggregate on (logs)
   Remote SQL: SELECT date(start_at), round(avg(duration), 0) FROM "default".logs WHERE ((date(start_at) >= {p1:Date})) AND ((date(start_at) <= {p2:Date})) GROUP BY (date(start_at)) ORDER BY date(start_at) ASC NULLS LAST
(4 rows)

Use DEALLOCATE to deallocate a prepared statement:

try=# DEALLOCATE avg_durations_between_dates;
DEALLOCATE

INSERT

Use the INSERT command to insert values into a remote ClickHouse table:

try=# INSERT INTO nodes(node_id, name, region, arch, os)
VALUES (9,  'Augustin Gamarra', 'us-west-2', 'amd64', 'Linux')
     , (10, 'Cerisier', 'us-east-2', 'amd64', 'Linux')
     , (11, 'Dewalt', 'use-central-1', 'arm64', 'macOS')
;
INSERT 0 3

COPY

Use the COPY command to insert a batch of rows into a remote ClickHouse table:

try=# COPY logs FROM stdin CSV;
4285871863,2025-12-05 11:13:58.360760,206,/widgets,POST,8,401
4020882978,2025-12-05 11:33:48.248450,199,/users/1321945,HEAD,3,200
3231273177,2025-12-05 12:20:42.158575,220,/search,GET,2,201
\.
>> COPY 3

⚠️ Batch API Limitations

pg_clickhouse hasn't yet implemented support for the PostgreSQL FDW batch insert API. Thus COPY currently uses INSERT statements to insert records. This will be improved in a future release.

LOAD

Use LOAD to load the pg_clickhouse shared library:

try=# LOAD 'pg_clickhouse';
LOAD

It's not normally necessary to use LOAD, as Postgres will automatically load pg_clickhouse the first time any of of its features (functions, foreign tables, etc.) are used.

The one time it may be useful to LOAD pg_clickhouse is to SET pg_clickhouse parameters before executing queries that depend on them.

SET

Use SET to set the the pg_clickhouse.session_settings runtime parameter. This parameter configures ClickHouse settings to be set on subsequent queries. Example:

SET pg_clickhouse.session_settings = 'join_use_nulls 1, final 1';

The default is join_use_nulls 1. Set it to an empty string to fall back on the ClickHouse server's settings.

SET pg_clickhouse.session_settings = '';

The syntax is a comma-delimited list of key/value pairs separated by one or more spaces. Keys must correspond to ClickHouse settings. Escape spaces, commas, and backslashes in values with a backslash:

SET pg_clickhouse.session_settings = 'join_algorithm grace_hash\,hash';

Or use single quoted values to avoid escaping spaces and commas; consider using dollar quoting to avoid the need to double-quote:

SET pg_clickhouse.session_settings = $$join_algorithm 'grace_hash,hash'$$;

If you care about legibility and need to set many settings, use multiple lines, for example:

SET pg_clickhouse.session_settings TO $$
    connect_timeout 2,
    count_distinct_implementation uniq,
    final 1,
    group_by_use_nulls 1,
    join_algorithm 'prefer_partial_merge',
    join_use_nulls 1,
    log_queries_min_type QUERY_FINISH,
    max_block_size 32768,
    max_execution_time 45,
    max_result_rows 1024,
    metrics_perf_events_list 'this,that',
    network_compression_method ZSTD,
    poll_interval 5,
    totals_mode after_having_auto
$$;

Some settings will be ignored in cases where they would interfere with the operation of pg_clickhouse itself. These include:

date_time_output_format: the http driver requires it to be "iso"
format_tsv_null_representation: the http driver requires the default
output_format_tsv_crlf_end_of_line the http driver requires the default

Otherwise, pg_clickhouse does not validate the settings, but passes them on to ClickHouse for every query. It thus supports all settings for each ClickHouse version.

Note that pg_clickhouse must be loaded before setting pg_clickhouse.session_settings; either use [shared library preloading] or simply use one of the objects in the extension to ensure it loads.

ALTER ROLE

Use ALTER ROLE's SET command to preload pg_clickhouse and/or SET its parameters for specific roles:

try=# ALTER ROLE CURRENT_USER SET session_preload_libraries = pg_clickhouse;
ALTER ROLE

try=# ALTER ROLE CURRENT_USER SET pg_clickhouse.session_settings = 'final 1';
ALTER ROLE

Use the ALTER ROLE's RESET command to reset pg_clickhouse preloading and/or parameters:

try=# ALTER ROLE CURRENT_USER RESET session_preload_libraries;
ALTER ROLE

try=# ALTER ROLE CURRENT_USER RESET pg_clickhouse.session_settings;
ALTER ROLE

Preloading

If every or nearly every Postgres connection needs to use pg_clickhouse, consider using [shared library preloading] to automatically load it:

`session_preload_libraries`

Loads the shared library for every new connection to PostgreSQL:

session_preload_libraries = pg_clickhouse

Useful to take advantage of updates without restarting the server: just reconnect. May also be set for specific users or roles via ALTER ROLE.

`shared_preload_libraries`

Loads the shared library into the PostgreSQL parent process at startup time:

shared_preload_libraries = pg_clickhouse

Useful to save memory and load overhead for every session, but requires the cluster to be restart when the library is updated.

Data Types

pg_clickhouse maps the following ClickHouse data types to PostgreSQL data types. IMPORT FOREIGN SCHEMA use the first type in the PostgreSQL column when importing columns; additional types may be used in CREATE FOREIGN TABLE statements:

ClickHouse	PostgreSQL	Notes
Bool	boolean
Date	date
Date32	date
DateTime	timestamptz
Decimal	numeric
Float32	real
Float64	double precision
IPv4	inet
IPv6	inet
Int16	smallint
Int32	integer
Int64	bigint
Int8	smallint
JSON	jsonb	HTTP engine only
String	text, bytea
UInt16	integer
UInt32	bigint
UInt64	bigint	Errors on values > BIGINT max
UInt8	smallint
UUID	uuid

Additional notes and details follow.

BYTEA

ClickHouse does not provide the equivalent of the PostgreSQL BYTEA type, but allows any bytes to be stored in String type. In general ClickHouse strings should be mapped to the PostgreSQL TEXT, but when using binary data, map it to BYTEA. Example:

-- Create clickHouse table with String columns.
SELECT clickhouse_raw_query($$
    CREATE TABLE bytes (
        c1 Int8, c2 String, c3 String
    ) ENGINE = MergeTree ORDER BY (c1);
$$);

-- Create foreign table with BYTEA columns.
CREATE FOREIGN TABLE bytes (
    c1 int,
    c2 BYTEA,
    c3 BYTEA
) SERVER ch_srv OPTIONS( table_name 'bytes' );

-- Insert binary data into the foreign table.
INSERT INTO bytes
SELECT n, sha224(bytea('val'||n)), decode(md5('int'||n), 'hex')
  FROM generate_series(1, 4) n;

-- View the results.
SELECT * FROM bytes;

That final SELECT query will output:

 c1 |                             c2                             |                 c3
----+------------------------------------------------------------+------------------------------------
  1 | \x1bf7f0cc821d31178616a55a8e0c52677735397cdde6f4153a9fd3d7 | \xae3b28cde02542f81acce8783245430d
  2 | \x5f6e9e12cd8592712e638016f4b1a2e73230ee40db498c0f0b1dc841 | \x23e7c6cacb8383f878ad093b0027d72b
  3 | \x53ac2c1fa83c8f64603fe9568d883331007d6281de330a4b5e728f9e | \x7e969132fc656148b97b6a2ee8bc83c1
  4 | \x4e3c2e4cb7542a45173a8dac939ddc4bc75202e342ebc769b0f5da2f | \x8ef30f44c65480d12b650ab6b2b04245
(4 rows)

Note that if there are any nul bytes in the ClickHouse columns, a foreign table using TEXT columns will not output the proper values:

-- Create foreign table with TEXT columns.
CREATE FOREIGN TABLE texts (
    c1 int,
    c2 TEXT,
    c3 TEXT
) SERVER ch_srv OPTIONS( table_name 'bytes' );

-- Encode binary data as hex.
SELECT c1, encode(c2::bytea, 'hex'), encode(c3::bytea, 'hex') FROM texts ORDER BY c1;

Will output:

 c1 |                          encode                          |              encode
----+----------------------------------------------------------+----------------------------------
  1 | 1bf7f0cc821d31178616a55a8e0c52677735397cdde6f4153a9fd3d7 | ae3b28cde02542f81acce8783245430d
  2 | 5f6e9e12cd8592712e638016f4b1a2e73230ee40db498c0f0b1dc841 | 23e7c6cacb8383f878ad093b
  3 | 53ac2c1fa83c8f64603fe9568d883331                         | 7e969132fc656148b97b6a2ee8bc83c1
  4 | 4e3c2e4cb7542a45173a8dac939ddc4bc75202e342ebc769b0f5da2f | 8ef30f44c65480d12b650ab6b2b04245
(4 rows)

Note that rows two and three contain truncated values. This is because PostgreSQL relies on nul-terminated strings and does not support nuls in its strings.

Attempting to insert binary values into TEXT columns will succeed and work as expected:

-- Insert via text columns:
TRUNCATE texts;
INSERT INTO texts
SELECT n, sha224(bytea('val'||n)), decode(md5('int'||n), 'hex')
  FROM generate_series(1, 4) n;

-- View the data.
SELECT c1, encode(c2::bytea, 'hex'), encode(c3::bytea, 'hex') FROM texts ORDER BY c1;

The text columns will be correct:


 c1 |                          encode                          |              encode
----+----------------------------------------------------------+----------------------------------
  1 | 1bf7f0cc821d31178616a55a8e0c52677735397cdde6f4153a9fd3d7 | ae3b28cde02542f81acce8783245430d
  2 | 5f6e9e12cd8592712e638016f4b1a2e73230ee40db498c0f0b1dc841 | 23e7c6cacb8383f878ad093b0027d72b
  3 | 53ac2c1fa83c8f64603fe9568d883331007d6281de330a4b5e728f9e | 7e969132fc656148b97b6a2ee8bc83c1
  4 | 4e3c2e4cb7542a45173a8dac939ddc4bc75202e342ebc769b0f5da2f | 8ef30f44c65480d12b650ab6b2b04245
(4 rows)

But reading them as BYTEA will not:

# SELECT * FROM bytes;
 c1 |                                                           c2                                                           |                                   c3
----+------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------
  1 | \x5c783162663766306363383231643331313738363136613535613865306335323637373733353339376364646536663431353361396664336437 | \x5c786165336232386364653032353432663831616363653837383332343534333064
  2 | \x5c783566366539653132636438353932373132653633383031366634623161326537333233306565343064623439386330663062316463383431 | \x5c783233653763366361636238333833663837386164303933623030323764373262
  3 | \x5c783533616332633166613833633866363436303366653935363864383833333331303037643632383164653333306134623565373238663965 | \x5c783765393639313332666336353631343862393762366132656538626338336331
  4 | \x5c783465336332653463623735343261343531373361386461633933396464633462633735323032653334326562633736396230663564613266 | \x5c783865663330663434633635343830643132623635306162366232623034323435
(4 rows)

Tip

As a rule, only use TEXT columns for encoded strings and use BYTEA columns only for binary data, and never switch between them.

Function and Operator Reference

Functions

These functions provide the interface to query a ClickHouse database.

`clickhouse_raw_query`

SELECT clickhouse_raw_query(
    'CREATE TABLE t1 (x String) ENGINE = Memory',
    'host=localhost port=8123'
);

Connect to a ClickHouse service via its HTTP interface, execute a single query, and disconnect. The optional second argument specifies a connection string that defaults to host=localhost port=8123. The supported connection parameters are:

host: The host to connect to; required.
port: The HTTP port to connect to; defaults to 8123 unless host is a ClickHouse Cloud host, in which case it defaults to 8443
dbname: The name of the database to connect to.
username: The username to connect as; defaults to default
password: The password to use to authenticate; defaults to no password

Useful for queries that return no records, but queries that do return values will be returned as a single text value:

SELECT clickhouse_raw_query(
    'SELECT schema_name, schema_owner from information_schema.schemata',
    'host=localhost port=8123'
);

      clickhouse_raw_query
---------------------------------
 INFORMATION_SCHEMA      default+
 default default                +
 git     default                +
 information_schema      default+
 system  default                +

(1 row)

Pushdown Functions

All PostgreSQL builtin functions used in conditionals (HAVING and WHERE clauses) to query ClickHouse foreign tables automatically push down to ClickHouse with the same names and signatures. However, some have different names or signatures and must be mapped to their equivalents. pg_clickhouse maps the following functions:

date_part:
- date_part('day'): toDayOfMonth
- date_part('doy'): toDayOfYear
- date_part('dow'): toDayOfWeek
- date_part('year'): toYear
- date_part('month'): toMonth
- date_part('hour'): toHour
- date_part('minute'): toMinute
- date_part('second'): toSecond
- date_part('quarter'): toQuarter
- date_part('isoyear'): toISOYear
- date_part('week'): toISOYear
- date_part('epoch'): toISOYear
date_trunc:
- date_trunc('week'): toMonday
- date_trunc('second'): toStartOfSecond
- date_trunc('minute'): toStartOfMinute
- date_trunc('hour'): toStartOfHour
- date_trunc('day'): toStartOfDay
- date_trunc('month'): toStartOfMonth
- date_trunc('quarter'): toStartOfQuarter
- date_trunc('year'): toStartOfYear
array_position: indexOf
btrim: trimBoth
strpos: position
regexp_like: match
md5: MD5

Custom Functions

These custom functions created by pg_clickhouse provide foreign query pushdown for select ClickHouse functions with no PostgreSQL equivalents. If any of these functions can't be pushed down they will raise an exception.

dictGet

Pushdown Casts

pg_clickhouse pushes down casts such as CAST(x AS bigint) for compatible data types. For incompatible types the pushdown will fail; if x in this example is a ClickHouse UInt64, ClickHouse will refuse to cast the value.

In order to push down casts to incompatible data types, pg_clickhouse provides the following functions. They raise an exception in PostgreSQL if they're not pushed down.

Pushdown Aggregates

These PostgreSQL aggregate functions pushdown to ClickHouse.

Custom Aggregates

These custom aggregate functions created by pg_clickhouse provide foreign query pushdown for select ClickHouse aggregate functions with no PostgreSQL equivalents. If any of these functions can't be pushed down they will raise an exception.

Pushdown Ordered Set Aggregates

These ordered-set aggregate functions map to ClickHouse Parametric aggregate functions by passing their direct argument as a parameter and their ORDER BY expressions as arguments. For example, this PostgreSQL query:

SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY a) FROM t1;

Maps to this ClickHouse query:

SELECT quantile(0.25)(a) FROM t1;

Note that the non-default ORDER BY suffixes DESC and NULLS FIRST aren't supported and will raise an error.

percentile_cont(double): quantile
quantile(double): quantile
quantileExact(double): quantileExact

Authors

David E. Wheeler

Description​

Getting Started​

Usage​

Versioning Policy​

DDL SQL Reference​

CREATE EXTENSION​

ALTER EXTENSION​

DROP EXTENSION​

CREATE SERVER​

ALTER SERVER​

DROP SERVER​

CREATE USER MAPPING​

ALTER USER MAPPING​

DROP USER MAPPING​

IMPORT FOREIGN SCHEMA​

CREATE FOREIGN TABLE​

ALTER FOREIGN TABLE​

DROP FOREIGN TABLE​

DML SQL Reference​

EXPLAIN​

SELECT​

PREPARE, EXECUTE, DEALLOCATE​

INSERT​

COPY​

LOAD​

SET​

ALTER ROLE​

Preloading​

session_preload_libraries​

shared_preload_libraries​

Data Types​

BYTEA​

Function and Operator Reference​

Functions​

clickhouse_raw_query​

Pushdown Functions​

Custom Functions​

Pushdown Casts​

Pushdown Aggregates​

Custom Aggregates​

Pushdown Ordered Set Aggregates​

Authors​

Copyright​