Skip to main content
Skip to main content

Paimon table engine

This engine provides a read-only integration with existing Apache Paimon tables in Amazon S3, Azure, HDFS and locally stored tables. It supports snapshot reads, incremental reads, and basic partition pruning provided by the engine.

Create table

Note that the Paimon table must already exist in the storage, this command does not take DDL parameters to create a new table. Creating Paimon* tables is gated by allow_experimental_paimon_storage_engine (disabled by default), so enable it before running CREATE TABLE.

SET allow_experimental_paimon_storage_engine = 1;

CREATE TABLE paimon_table_s3
    ENGINE = PaimonS3(url,  [, access_key_id, secret_access_key] [,format] [,structure] [,compression])

CREATE TABLE paimon_table_azure
    ENGINE = PaimonAzure(connection_string|storage_account_url, container_name, blobpath, [,account_name], [,account_key] [,format] [,compression_method])

CREATE TABLE paimon_table_hdfs
    ENGINE = PaimonHDFS(path_to_table, [,format] [,compression_method])

CREATE TABLE paimon_table_local
    ENGINE = PaimonLocal(path_to_table, [,format] [,compression_method])

Engine arguments

Description of the arguments coincides with description of arguments in engines S3, AzureBlobStorage, HDFS and File correspondingly. format stands for the format of data files in the Paimon table.

Engine parameters can be specified using Named Collections

Example

CREATE TABLE paimon_table ENGINE=PaimonS3('http://test.s3.amazonaws.com/clickhouse-bucket/test_table', 'test', 'test')

Using named collections:

<clickhouse>
    <named_collections>
        <paimon_conf>
            <url>http://test.s3.amazonaws.com/clickhouse-bucket/</url>
            <access_key_id>test</access_key_id>
            <secret_access_key>test</secret_access_key>
        </paimon_conf>
    </named_collections>
</clickhouse>
CREATE TABLE paimon_table ENGINE=PaimonS3(paimon_conf, filename = 'test_table')

Capabilities

  • Snapshot reads from the latest table snapshot.
  • Incremental reads based on committed snapshot id when enabled.
  • Partition pruning when use_paimon_partition_pruning is enabled.
  • Optional background refresh of metadata when configured.
  • Stable table UUID when using Atomic/Replicated databases, enabling {uuid} macros in Keeper paths.

Settings

This engine uses the same settings as the corresponding object storage engines and adds Paimon-specific settings:

  • allow_experimental_paimon_storage_engine — enables creation of Paimon, PaimonS3, PaimonAzure, PaimonHDFS, and PaimonLocal table engines. Default: 0 (disabled).
  • paimon_incremental_read — enable incremental read mode.
  • paimon_metadata_refresh_interval_sec — background metadata refresh interval in seconds. When set to a value greater than 0, a background task periodically pulls the latest snapshot and schema from object storage. Default: 30.
  • paimon_keeper_path — Keeper path for incremental read state. Must be set and unique per table; supports macros such as {database}, {table}, {uuid}.
  • paimon_replica_name — Replica name for incremental read state. Must be set and unique per replica; supports macros such as {replica}.

Incremental read examples

Incremental read with Keeper state:

CREATE TABLE paimon_inc
ENGINE = PaimonS3(paimon_conf, filename = 'paimon_all_types')
SETTINGS
    paimon_incremental_read = 1,
    paimon_keeper_path = '/clickhouse/{database}/{uuid}',
    paimon_replica_name = '{replica}';

Targeted snapshot delta read (query-level):

SELECT count()
FROM paimon_inc
SETTINGS paimon_target_snapshot_id = 1;

Paimon to MergeTree via Refreshable Materialized View

You can build an end-to-end pipeline that continuously syncs data from a Paimon table into a MergeTree table using a refreshable Materialized View in APPEND mode. Each refresh cycle reads only new incremental data from Paimon and appends it to the destination table.

Step 1 — Create the Paimon source table with incremental read and metadata refresh enabled.

The example below uses PaimonLocal. Replace the engine with PaimonS3, PaimonAzure, PaimonHDFS, or the Paimon alias as appropriate for your storage backend:

SET allow_experimental_paimon_storage_engine = 1;

-- Local storage
CREATE TABLE paimon_mv_source
ENGINE = PaimonLocal('/path/to/paimon/table')
SETTINGS
    paimon_incremental_read = 1,
    paimon_keeper_path = '/clickhouse/tables/{uuid}',
    paimon_replica_name = '{replica}',
    paimon_metadata_refresh_interval_sec = 1;

-- S3 storage (Paimon is an alias for PaimonS3)
CREATE TABLE paimon_mv_source
ENGINE = Paimon('http://minio:9000/bucket/path/to/table', 'access_key', 'secret_key')
SETTINGS
    paimon_incremental_read = 1,
    paimon_keeper_path = '/clickhouse/tables/{uuid}',
    paimon_replica_name = '{replica}',
    paimon_metadata_refresh_interval_sec = 1;

paimon_metadata_refresh_interval_sec sets the background metadata refresh interval in seconds. When greater than 0, a background task periodically pulls the latest snapshot and schema from object storage, so that the MV refresh cycle can see newly committed data without waiting for a query to trigger the metadata update. Default is 30. Use cautiously on many tables to avoid excessive object storage and Keeper I/O.

Step 2 — Create the MergeTree destination table (schema cloned from the Paimon table):

CREATE TABLE paimon_mv_dest AS paimon_mv_source
ENGINE = MergeTree()
ORDER BY tuple();

Step 3 — Create the refreshable Materialized View:

CREATE MATERIALIZED VIEW paimon_mv
REFRESH EVERY 10 SECOND
APPEND
TO paimon_mv_dest
AS SELECT * FROM paimon_mv_source;

Every 10 seconds the MV fires a SELECT * FROM paimon_mv_source, which returns only the rows added since the last committed snapshot, and appends them to paimon_mv_dest.

Cleanup:

SYSTEM STOP VIEW paimon_mv;
DROP VIEW IF EXISTS paimon_mv SYNC;
DROP TABLE IF EXISTS paimon_mv_dest SYNC;
DROP TABLE IF EXISTS paimon_mv_source SYNC;
Note

Stop the MV before dropping it to prevent background refresh from blocking DDL operations.

Limitations

  • Incremental read requires Keeper (ZooKeeper) to be configured.
  • Incremental read requires paimon_keeper_path to be set and unique per table.
  • paimon_replica_name must be unique per replica within the same Keeper path.
  • The table engine is read-only; data modification is not supported.
  • Incremental read does not handle historical data deletions from the Paimon source. If upstream Paimon data is deleted or updated, the corresponding rows already written to a ClickHouse MergeTree destination table will not be automatically removed. You must manually issue ALTER TABLE ... DELETE on the MergeTree table to clean up stale data.

Aliases

Table engine Paimon is an alias to PaimonS3 now.

Virtual Columns

  • _path — Path to the file. Type: LowCardinality(String).
  • _file — Name of the file. Type: LowCardinality(String).
  • _size — Size of the file in bytes. Type: Nullable(UInt64). If the file size is unknown, the value is NULL.
  • _time — Last modified time of the file. Type: Nullable(DateTime). If the time is unknown, the value is NULL.
  • _etag — The etag of the file. Type: LowCardinality(String). If the etag is unknown, the value is NULL.

Data Types supported

Paimon Data TypeClickHouse Data Type
BOOLEANInt8
TINYINTInt8
SMALLINTInt16
INTEGERInt32
BIGINTInt64
FLOATFloat32
DOUBLEFloat64
STRING,VARCHAR,BYTES,VARBINARYString
DATEDate
TIME(p),TIMETime('UTC')
TIMESTAMP(p) WITH LOCAL TIME ZONEDateTime64
TIMESTAMP(p)DateTime64('UTC')
CHARFixedString(1)
BINARY(n)FixedString(n)
DECIMAL(P,S)Decimal(P,S)
ARRAYArray
MAPMap

Partition supported

Data types supported in Paimon partition keys:

  • CHAR
  • VARCHAR
  • BOOLEAN
  • DECIMAL
  • TINYINT
  • SMALLINT
  • INTEGER
  • DATE
  • TIME
  • TIMESTAMP
  • TIMESTAMP WITH LOCAL TIME ZONE
  • BIGINT
  • FLOAT
  • DOUBLE