본문으로 바로가기
본문으로 바로가기

system.predicate_statistics_log

Querying in ClickHouse Cloud

The data in this system table is held locally on each node in ClickHouse Cloud. Obtaining a complete view of all data, therefore, requires the clusterAllReplicas function. See here for further details.

Description

Contains sampled selectivity statistics collected while reading from MergeTree tables. The table is populated only when predicate_statistics_sample_rate is greater than 0.

Use this table to inspect how selective user predicates are in real workloads and how many granules remain after primary-key or skip-index filtering. The data is intended as input for workload-driven index and projection recommendations.

Row shapes

A single query can produce two kinds of rows in system.predicate_statistics_log:

  • Filter rows, emitted per prewhere/filter step in MergeTreeSelectProcessor. They populate predicate_expression, input_rows, passed_rows, filter_selectivity, and the whole-predicate columns total_input_rows, total_passed_rows, total_selectivity. Index-related columns are empty.
  • Index rows, emitted per read step in ReadFromMergeTree. They populate the index_names, index_types, total_granules, granules_after, and index_selectivities arrays, one entry per index stage (primary key, partition, skip indexes). Predicate-related columns are empty.

Filter rows and index rows for the same query share the same query_id and table, so they can be joined when both are needed.

Sampling and overhead

Sampling is controlled by predicate_statistics_sample_rate:

  • 0 disables collection.
  • 1 samples every query.
  • N > 1 samples approximately 1 / N of queries, hashed by query_id.

Lower values produce more data but add CPU work on the read path and more writes to the system log. After enabling the setting, use SYSTEM FLUSH LOGS if you need rows to appear immediately.

Columns

  • hostname (LowCardinality(String)) — Hostname of the server executing the query.
  • event_date (Date) — Event date.
  • event_time (DateTime) — Timestamp when this log entry was written.
  • database (LowCardinality(String)) — Database name of the target table.
  • table (LowCardinality(String)) — Table name of the target table.
  • query_id (String) — Query ID for linking back to system.query_log.
  • predicate_expression (String) — Whole filter expression handled by this prewhere/filter step (ActionsDAG dump). Empty for index rows.
  • input_rows (UInt64) — Rows entering this prewhere/filter step. Empty for index rows.
  • passed_rows (UInt64) — Rows surviving this prewhere/filter step. Empty for index rows.
  • filter_selectivity (Float64) — Selectivity of this step: passed_rows / input_rows. Empty for index rows.
  • total_input_rows (UInt64) — Rows entering the first prewhere step (total rows read from granules). Empty for index rows.
  • total_passed_rows (UInt64) — Rows surviving all prewhere steps (rows delivered to the query). Empty for index rows.
  • total_selectivity (Float64) — Selectivity of the whole predicate: total_passed_rows / total_input_rows. Empty for index rows.
  • index_names (Array(LowCardinality(String))) — Names of indexes applied, e.g. ['PrimaryKey', 'idx_bf_status']. Populated on index rows only.
  • index_types (Array(LowCardinality(String))) — Types of indexes applied: PrimaryKey, Skip, MinMax, Partition. Populated on index rows only.
  • total_granules (Array(UInt64)) — Granules entering each index stage. Populated on index rows only.
  • granules_after (Array(UInt64)) — Granules remaining after each index stage. Populated on index rows only.
  • index_selectivities (Array(Float64)) — Per-index selectivity: granules_after / total_granules. Populated on index rows only.

Example

SET predicate_statistics_sample_rate = 1;

SELECT *
FROM hits
WHERE URL LIKE '%/product/%' AND EventDate >= today() - 7
FORMAT Null;

SYSTEM FLUSH LOGS predicate_statistics_log;

SELECT
    query_id,
    predicate_expression,
    round(filter_selectivity, 3) AS step_selectivity,
    round(total_selectivity, 3) AS query_selectivity,
    index_names,
    index_selectivities
FROM system.predicate_statistics_log
WHERE table = 'hits'
ORDER BY event_time DESC
LIMIT 10;

See also