What's New in pg_clickhouse - JSONB Support, SQL value functions, Streaming, and more

image 512x512 1
Apr 24, 2026 ยท 13 minutes read

We've been gratified by the community reception of pg_clickhouse, the extension to query ClickHouse databases from Postgres. Recent uptake generated a ton of feedback, which we've been diligently addressing in the last few releases. These changes follow our constant mantra for pg_clickhouse: pushdown, pushdown, pushdown! Let's take a quick tour.

Setup

If you'd like to follow along, the following examples use this ClickHouse table:

CREATE TABLE events (
    id    UInt32,
    event String,
    tags  Array(String),
    at    DateTime64,
    props JSON
) ENGINE = MergeTree ORDER BY (event, id);

INSERT INTO events VALUES
    (
        1, 'order', ['a', 'b', 'c'], '2025-12-28 10:42:35.342',
        '{"cid": "C100", "address": {"city": "Paris, France", "code": "75001"}}'
    ),
    (
        2, 'order', ['d', 'e', 'f'], '2026-02-22 05:26:26.982',
        '{"cid": "C200", "address": {"city": "London, UK", "code": "SW1A"}}'
    ),
    (
        3, 'return', ['๐Ÿ˜€', 'โšฝ๏ธ'], now64() - 86400 * 2,
        '{"cid": "C200", "address": {"city": "Manchester, UK", "code": "M2 1AB"}}'
    ),
    (
        4, 'order', ['x', 'y', 'z'], now64() - 86400,
        '{"cid": "C300", "address": {"city": "New York, USA", "code": "10030"}}'
    ),
    (
        5, 'deliver', [], now64(),
        '{"cid": "C500", "address": {"city": "Portland, USA", "code": "97212"}}'
    )
;

And this pg_clickhouse foreign table configuration in Postgres:

CREATE SERVER ch FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(driver 'http');
CREATE USER MAPPING FOR CURRENT_USER SERVER ch;
CREATE EXTENSION pg_clickhouse;
CREATE SCHEMA customer;
IMPORT FOREIGN SCHEMA "default" FROM SERVER ch INTO customer;

JSONB accessors

For Postgres JSONB columns mapped to the ClickHouse JSON type,1 pg_clickhouse v0.1.10 added JSONB accessor operator and function pushdown outside SELECT clauses (generally in WHERE, ORDER BY, and HAVING clauses). It does so by converting JSON property accessors to the ClickHouse sub-column syntax.

See, example, the Remote SQL output from this verbose EXPLAIN using ->> to compare a JSON object value to a string:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, event, props 
FROM customer.events 
WHERE props ->> 'cid' = 'C200';
QUERY PLAN
------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, event, props
   Remote SQL: SELECT id, event, props FROM "default".events WHERE ((props.cid = 'C200'))
(3 rows)

This allows ClickHouse to filter "C100" directly. The output is just what you'd expect:

SELECT id, event, props
FROM customer.events 
WHERE props ->> 'cid' = 'C200';
id | event  |                                  props
----+--------+--------------------------------------------------------------------------
  2 | order  | {"cid": "C200", "address": {"city": "London, UK", "code": "SW1A"}}
  3 | return | {"cid": "C200", "address": {"city": "Manchester, UK", "code": "M2 1AB"}}
(2 rows)

For the -> operator, which returns a JSONB value, pg_clickhouse has ClickHouse convert the value returned by the sub-column syntax to JSON in order to compare values as Postgres does:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, event, props 
FROM customer.events 
WHERE props -> 'cid' = '"C300"'::jsonb;
QUERY PLAN
----------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, event, props
   Remote SQL: SELECT id, event, props FROM "default".events WHERE ((toJSONString(props.cid) = '"C300"'))
(3 rows)

Executing the query returns the expected result:

SELECT id, event, props
FROM customer.events 
WHERE props -> 'cid' = '"C300"'::jsonb;
id | event |                                 props
----+-------+------------------------------------------------------------------------
  4 | order | {"cid": "C300", "address": {"city": "New York, USA", "code": "10030"}}
(1 row)

The same pattern applies to the JSONB functions jsonb_extract_path() and jsonb_extract_path_text() functions, which also allowing multiple paths to get to nested values, as visible in the Remote SQL for this plan:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, event, props FROM customer.events
WHERE jsonb_extract_path_text(props, 'address', 'city') = 'Paris, France';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, event, props
   Remote SQL: SELECT id, event, props FROM "default".events WHERE ((props.address.city = 'Paris, France'))
(3 rows)

Of course execution returns the expected results:

SELECT id, event, props 
FROM customer.events
WHERE jsonb_extract_path_text(props, 'address', 'city') = 'Paris, France';
id | event |                                 props
----+-------+------------------------------------------------------------------------
  1 | order | {"cid": "C100", "address": {"city": "Paris, France", "code": "75001"}}
(1 row)

And the same goes for pushing down comparisons to JSON values using jsonb_extract_path():

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, event, props 
FROM customer.events
WHERE jsonb_extract_path(props, 'address', 'city') = '"New York, USA"';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, event, props
   Remote SQL: SELECT id, event, props FROM "default".events WHERE ((toJSONString(props.address.city) = '"New York, USA"'))
(3 rows)
SELECT id, event, props 
FROM customer.events
WHERE jsonb_extract_path(props, 'address', 'city') = '"New York, USA"';
id | event |                                 props
----+-------+------------------------------------------------------------------------
  4 | order | {"cid": "C300", "address": {"city": "New York, USA", "code": "10030"}}
(1 row)

SQL value functions

One of our customers switched queries from Postgres to pg_clickhouse tables and ran into failures using certain date and time functions, like CURRENT_DATE and CURRENT_TIMESTAMP. pg_clickhouse did not push down those functions, which caused issues used in combination with functions like date_part() and date_trunc(), which do.

pg_clickhouse v0.2.0 improved the pushdown of all of the "current"-type date and time functions, such that they all push down and produce values more correctly relative to the local Postgres configuration than before.

For example, to look at records from before CURRENT_DATE, pg_clickhouse produces this plan:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id FROM customer.events WHERE AT < CURRENT_DATE;
QUERY PLAN
----------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id
   Remote SQL: SELECT id FROM "default".events WHERE ((at < toDate(now('America/New_York'))))
(3 rows)

It uses the time zone currently set in the Postgres session to ensure the date is relative to the expected time zone. It does the same for CURRENT_TIMESTAMP, also specifying precision 6, the default precision for Postgres timestamps:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id 
FROM customer.events 
WHERE AT < CURRENT_TIMESTAMP;
QUERY PLAN
-------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id
   Remote SQL: SELECT id FROM "default".events WHERE ((at < now64(6, 'America/New_York')))
(3 rows)

Naturally passes an explicit precision:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id 
FROM customer.events 
WHERE AT < CURRENT_TIMESTAMP(3);
QUERY PLAN
-------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id
   Remote SQL: SELECT id FROM "default".events WHERE ((at < now64(3, 'America/New_York')))
(3 rows)

In addition to these SQL-standard current date and time keywords, we've added pushdown for the Postgres-specific timestamps functions clock_timestamp(), statement_timestamp(), and transaction_timestamp(), which all push down to the closest ClickHouse equivalent, nowInBlock64:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id 
FROM customer.events 
WHERE AT < clock_timestamp();
QUERY PLAN
--------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id
   Remote SQL: SELECT id FROM "default".events WHERE ((at < nowInBlock64(6, 'America/New_York')))
(3 rows)

These functions work properly with other pushdown functions like date_part:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, at FROM customer.events
WHERE date_part('year', at) < date_part('year', CURRENT_DATE);
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, at
   Remote SQL: SELECT id, at FROM "default".events WHERE ((toYear(at) < toYear(cast(toDate(now('America/New_York')), 'Nullable(DateTime)'))))
(3 rows)
SELECT id, at 
FROM customer.events
WHERE date_part('year', at) < date_part('year', CURRENT_DATE);
id |             at
----+----------------------------
  1 | 2025-12-28 05:42:35.342-05
(1 row)

As well as date_trunc โ€” even with some interval date math thrown in:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, at
FROM customer.events
WHERE date_trunc('day', at) >= date_trunc('day', CURRENT_DATE) - INTERVAL '1 day';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, at
   Remote SQL: SELECT id, at FROM "default".events WHERE ((toStartOfDay(at) >= (toStartOfDay(toDate(now('America/New_York'))) - 86400)))
(3 rows)
SELECT id, at
FROM customer.events
WHERE date_trunc('day', at) >= date_trunc('day', CURRENT_DATE) - INTERVAL '1 day';
id |             at
----+----------------------------
  5 | 2026-04-17 17:29:47.046-04
  4 | 2026-04-16 17:29:47.046-04
(2 rows)

Array functions

Following the http driver array parsing improvements in v0.1.4, pg_clickhouse v0.2.0 added pushdown support for a slew of array functions. For example, array_cat maps to arrayConcat:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, tags FROM customer.events WHERE array_cat(tags, ARRAY['๐Ÿฅ']) = ARRAY['๐Ÿ˜€','โšฝ๏ธ','๐Ÿฅ'];
QUERY PLAN
------------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, tags
   Remote SQL: SELECT id, tags FROM "default".events WHERE ((arrayConcat(tags, ['๐Ÿฅ']) = ['๐Ÿ˜€','โšฝ๏ธ','๐Ÿฅ']))
(3 rows)
SELECT id, tags
FROM customer.events
WHERE array_cat(tags, ARRAY['๐Ÿฅ']) = ARRAY['๐Ÿ˜€','โšฝ๏ธ','๐Ÿฅ'];
id |  tags
----+---------
  3 | {๐Ÿ˜€,โšฝ๏ธ}
(1 row)

array_to_string maps to arrayStringConcat:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, tags
FROM customer.events
WHERE array_to_string(tags, '|') = 'a|b|c';
QUERY PLAN
------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, tags
   Remote SQL: SELECT id, tags FROM "default".events WHERE ((arrayStringConcat(tags, '|') = 'a|b|c'))
(3 rows)
SELECT id, tags
FROM customer.events
WHERE array_to_string(tags, '|') = 'a|b|c';
id |  tags
----+---------
  1 | {a,b,c}
(1 row)

And string_to_array maps to splitByString, here used in combination with the aforementioned JSONB accessors:

EXPLAIN (VERBOSE, COSTS OFF)
SELECT id, event, jsonb_extract_path_text(props, 'address', 'code')
FROM customer.events
WHERE string_to_array(jsonb_extract_path_text(props, 'address', 'city'), ', ') = ARRAY['Portland', 'USA'];
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
 Foreign Scan on customer.events
   Output: id, event, tags, at, props
   Remote SQL: SELECT id, event, tags, at, props FROM "default".events WHERE ((splitByString(', ', props.address.city) = ['Portland','USA']))
(3 rows)
SELECT id, event, jsonb_extract_path_text(props, 'address', 'code')
FROM customer.events
WHERE string_to_array(jsonb_extract_path_text(props, 'address', 'city'), ', ') = ARRAY['Portland', 'USA'];
id |  event  | jsonb_extract_path_text
----+---------+-------------------------
  5 | deliver | 97212
(1 row)

We mapped so many more! See the full list for the range of possibilities.

HTTP result set streaming

Of course, we don't solely focus on pushdown; sometimes we need to address push back, as it were.

By default, when a Postgres foreign data wrapper executes a foreign query, it collects all of the results in memory before returning them to the caller. This works great for small result sets such as those returned by typical ClickHouse aggregate queries. But sometimes an app needs to process a substantial amount of the data itself, which can lead to memory pressure issues as Postgres pulls an entire data set into memory. Beware the OOM Killer!

In pg_clickhouse v0.1.10, we added query result streaming to the http driver, which buffers a limited batch of results in memory (ca. 50MB by default) and returns them before reusing the memory for the next batch. To see it in action, we loaded the NYC taxi data set into ClickHouse, then spun up a pre-streaming pg_clickhouse v0.6.1 OCI image and imported the trips_small table into the nyc_taxi schema using the http driver:

docker run --name pg_clickhouse -p 6432:5432 -e POSTGRES_HOST_AUTH_METHOD=trust -d ghcr.io/clickhouse/pg_clickhouse:18
docker exec -it pg_clickhouse bash -c 'apt-get update && apt-get install ca-certificates'
psql -U postgres -h localhost -p 6432 <<EOF
CREATE EXTENSION pg_clickhouse;
CREATE SERVER my_ch FOREIGN DATA WRAPPER clickhouse_fdw OPTIONS(
    driver 'http', host 'abcdefghij.us-east-1.aws.clickhouse.cloud', port '8443'
);
CREATE USER MAPPING FOR CURRENT_USER SERVER my_ch OPTIONS(user 'default', password 'xxxxxxxxxxxxx');
CREATE SCHEMA nyc_taxi;
IMPORT FOREIGN SCHEMA nyc_taxi FROM SERVER my_ch INTO nyc_taxi;
EOF

We started a process to continually measure the memory consumption of the OCI container:2

while true; do
    docker stats --no-stream --format "{{.MemUsage}}" pg_clickhouse | \
        cut -d '/' -f 1 | xargs printf "%s %s\n" "$(date +%T)" | sed -e 's/MiB//g'
done

Finally, we ran the query:

psql -U postgres -h localhost -p 6432 -c 'SELECT * FROM nyc_taxi.trips_small' > /dev/null

Then we repeated the steps with the streaming-enabled pg_clickhouse v0.2.0 OCI image and compared the results. This graph nicely summarizes the difference:

HTTP Memory Graph.png

The data, massaged for the timings to line up, makes the case as well:

Secondsv0.1.10v0.2.0
026.6549.7
226.6349.6
450.860.5
695.6101.4
8118.7101.4
10155.0101.4
12205.279.9
14257.279.9
16292.179.9
18333.679.9
20381.679.9
22420.479.9
24459.185.5
26499.685.5
28538.685.5
30573.885.5
32601.885.5
34601.837.7
36601.837.7
38601.837.7
4035.7837.7
4235.7837.7

While v0.6.0 spikes up to over 600MiB of memory consumption, v0.2.0, with streaming enabled, never exceeds 86 MiB. It's a little faster, too! Of course the bigger the result set the greater the memory savings. We plan to introduce streaming in the binary driver in a future release, as well.

What's next

We've got more in the works. Watch this space for more news about window function pushdown and regular expression compatibility. Until then, join us at PGConf.dev to hear about what we learned Building a Foreign Data Wrapper.

Footnotes

  1. Supported by the http driver only for now. โ†ฉ

  2. As Postgres core hacker Andres Freund explains, this type of brute force memory measurement produces inaccurate results, generally showing Postgres using far more memory that it does, in absolute terms. We deem it acceptable here, however, for a relative comparison. โ†ฉ

Try Postgres managed by ClickHouse

ClickHouse + Postgres has become the unified data stack for applications that scale. With Managed Postgres now available in ClickHouse Cloud, this stack is a day-1 decision.

Get access

Share this post

  • Y Combinator icon
  • X icon
  • Bluesky icon
  • Facebook icon
  • LinkedIn icon

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!

Recent posts

The ClickStack Team ยท Jun 24, 2026
Tom Schreiber, Mark Needham and Lionel Palacin ยท Jun 23, 2026