system.predicate_statistics_log
The data in this system table is held locally on each node in ClickHouse Cloud. Obtaining a complete view of all data, therefore, requires the clusterAllReplicas function. See here for further details.
Description
Contains sampled selectivity statistics collected while reading from MergeTree tables. The table is populated only when predicate_statistics_sample_rate is greater than 0.
Use this table to inspect how selective user predicates are in real workloads and how many granules remain after primary-key or skip-index filtering. The data is intended as input for workload-driven index and projection recommendations.
Row shapes
A single query can produce two kinds of rows in system.predicate_statistics_log:
- Filter rows, emitted per prewhere/filter step in
MergeTreeSelectProcessor. They populatepredicate_expression,input_rows,passed_rows,filter_selectivity, and the whole-predicate columnstotal_input_rows,total_passed_rows,total_selectivity. Index-related columns are empty. - Index rows, emitted per read step in
ReadFromMergeTree. They populate theindex_names,index_types,total_granules,granules_after, andindex_selectivitiesarrays, one entry per index stage (primary key, partition, skip indexes). Predicate-related columns are empty.
Filter rows and index rows for the same query share the same query_id and table, so they can be joined when both are needed.
Sampling and overhead
Sampling is controlled by predicate_statistics_sample_rate:
0disables collection.1samples every query.N > 1samples approximately1 / Nof queries, hashed byquery_id.
Lower values produce more data but add CPU work on the read path and more writes to the system log. After enabling the setting, use SYSTEM FLUSH LOGS if you need rows to appear immediately.
Columns
hostname(LowCardinality(String)) — Hostname of the server executing the query.event_date(Date) — Event date.event_time(DateTime) — Timestamp when this log entry was written.database(LowCardinality(String)) — Database name of the target table.table(LowCardinality(String)) — Table name of the target table.query_id(String) — Query ID for linking back tosystem.query_log.predicate_expression(String) — Whole filter expression handled by this prewhere/filter step (ActionsDAGdump). Empty for index rows.input_rows(UInt64) — Rows entering this prewhere/filter step. Empty for index rows.passed_rows(UInt64) — Rows surviving this prewhere/filter step. Empty for index rows.filter_selectivity(Float64) — Selectivity of this step:passed_rows / input_rows. Empty for index rows.total_input_rows(UInt64) — Rows entering the first prewhere step (total rows read from granules). Empty for index rows.total_passed_rows(UInt64) — Rows surviving all prewhere steps (rows delivered to the query). Empty for index rows.total_selectivity(Float64) — Selectivity of the whole predicate:total_passed_rows / total_input_rows. Empty for index rows.index_names(Array(LowCardinality(String))) — Names of indexes applied, e.g.['PrimaryKey', 'idx_bf_status']. Populated on index rows only.index_types(Array(LowCardinality(String))) — Types of indexes applied:PrimaryKey,Skip,MinMax,Partition. Populated on index rows only.total_granules(Array(UInt64)) — Granules entering each index stage. Populated on index rows only.granules_after(Array(UInt64)) — Granules remaining after each index stage. Populated on index rows only.index_selectivities(Array(Float64)) — Per-index selectivity:granules_after / total_granules. Populated on index rows only.