AggregatingMergeTree table engine

The engine inherits from MergeTree, altering the logic for data parts merging. ClickHouse replaces all rows with the same primary key (or more accurately, with the same sorting key) with a single row (within a single data part) that stores a combination of states of aggregate functions. You can use AggregatingMergeTree tables for incremental data aggregation, including for aggregated materialized views. You can see an example of how to use the AggregatingMergeTree and Aggregate functions in the below video:

The engine processes all columns with the following types:

It is appropriate to use AggregatingMergeTree if it reduces the number of rows by orders.

Creating a table

CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
    name1 [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
    name2 [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
    ...
) ENGINE = AggregatingMergeTree()
[PARTITION BY expr]
[ORDER BY expr]
[SAMPLE BY expr]
[TTL expr]
[SETTINGS name=value, ...]

For a description of request parameters, see request description. Query clauses When creating an AggregatingMergeTree table, the same clauses are required as when creating a MergeTree table.

SELECT and INSERT

To insert data, use INSERT SELECT query with aggregate -State- functions. When selecting data from AggregatingMergeTree table, use GROUP BY clause and the same aggregate functions as when inserting data, but using the -Merge suffix. In the results of SELECT query, the values of AggregateFunction type have implementation-specific binary representation for all of the ClickHouse output formats. For example, if you dump data into TabSeparated format with a SELECT query, then this dump can be loaded back using an INSERT query.

Example of an aggregated materialized view

The following example assumes that you have a database named test. Create it if it doesn’t already exist using the command below:

CREATE DATABASE test;

Now create the table test.visits that contains the raw data:

CREATE TABLE test.visits
 (
    StartDate DateTime64 NOT NULL,
    CounterID UInt64,
    Sign Nullable(Int32),
    UserID Nullable(Int32)
) ENGINE = MergeTree ORDER BY (StartDate, CounterID);

Next, you need an AggregatingMergeTree table that will store AggregationFunctions that keep track of the total number of visits and the number of unique users. Create an AggregatingMergeTree materialized view that watches the test.visits table, and uses the AggregateFunction type:

CREATE TABLE test.agg_visits (
    StartDate DateTime64 NOT NULL,
    CounterID UInt64,
    Visits AggregateFunction(sum, Nullable(Int32)),
    Users AggregateFunction(uniq, Nullable(Int32))
)
ENGINE = AggregatingMergeTree() ORDER BY (StartDate, CounterID);

Create a materialized view that populates test.agg_visits from test.visits:

CREATE MATERIALIZED VIEW test.visits_mv TO test.agg_visits
AS SELECT
    StartDate,
    CounterID,
    sumState(Sign) AS Visits,
    uniqState(UserID) AS Users
FROM test.visits
GROUP BY StartDate, CounterID;

Insert data into the test.visits table:

INSERT INTO test.visits (StartDate, CounterID, Sign, UserID)
VALUES (1667446031000, 1, 3, 4), (1667446031000, 1, 6, 3);

The data is inserted in both test.visits and test.agg_visits. To get the aggregated data, execute a query such as SELECT ... GROUP BY ... from the materialized view test.visits_mv:

SELECT
    StartDate,
    sumMerge(Visits) AS Visits,
    uniqMerge(Users) AS Users
FROM test.visits_mv
GROUP BY StartDate
ORDER BY StartDate;

┌───────────────StartDate─┬─Visits─┬─Users─┐
│ 2022-11-03 03:27:11.000 │      9 │     2 │
└─────────────────────────┴────────┴───────┘

Add another couple of records to test.visits, but this time try using a different timestamp for one of the records:

INSERT INTO test.visits (StartDate, CounterID, Sign, UserID)
VALUES (1669446031000, 2, 5, 10), (1667446031000, 3, 7, 5);

Run the SELECT query again, which will return the following output:

┌───────────────StartDate─┬─Visits─┬─Users─┐
│ 2022-11-03 03:27:11.000 │     16 │     3 │
│ 2022-11-26 07:00:31.000 │      5 │     1 │
└─────────────────────────┴────────┴───────┘

In some cases, you might want to avoid pre-aggregating rows at insert time to shift the cost of aggregation from insert time to merge time. Ordinarily, it is necessary to include the columns which are not part of the aggregation in the GROUP BY clause of the materialized view definition to avoid an error. However, you can make use of the initializeAggregation function with setting optimize_on_insert = 0 (it is turned on by default) to achieve this. Use of GROUP BY is no longer required in this case:

CREATE MATERIALIZED VIEW test.visits_mv TO test.agg_visits
AS SELECT
    StartDate,
    CounterID,
    initializeAggregation('sumState', Sign) AS Visits,
    initializeAggregation('uniqState', UserID) AS Users
FROM test.visits;

When using initializeAggregation, an aggregate state is created for each individual row without grouping. Each source row produces one row in the materialized view, and the actual aggregation happens later when the AggregatingMergeTree merges parts. This is only true if optimize_on_insert = 0.

Tuple element aggregation

When the allow_tuple_element_aggregation setting is enabled, Tuple columns are recursively flattened so that each leaf element participates in aggregation independently. This means that AggregateFunction or SimpleAggregateFunction sub-columns inside a Tuple are aggregated according to their respective functions, just as if they were top-level columns. Sub-columns that belong to a Tuple in the sorting key are excluded from aggregation. Non-aggregate sub-columns are treated as ordinary columns (their first value is kept).

This setting is immutable and must be specified at table creation time.

CREATE TABLE agg_tuples
(
    key UInt32,
    metrics Tuple(
        total_visits SimpleAggregateFunction(sum, UInt64),
        unique_users SimpleAggregateFunction(max, UInt64)
    )
) ENGINE = AggregatingMergeTree()
ORDER BY key
SETTINGS allow_tuple_element_aggregation = 1;

INSERT INTO agg_tuples VALUES (1, (100, 5));
INSERT INTO agg_tuples VALUES (1, (200, 8));
INSERT INTO agg_tuples VALUES (2, (50, 3));

OPTIMIZE TABLE agg_tuples FINAL;

SELECT key, metrics.total_visits, metrics.unique_users FROM agg_tuples ORDER BY key;

┌─key─┬─metrics.total_visits─┬─metrics.unique_users─┐
│   1 │                  300 │                    8 │
│   2 │                   50 │                    3 │
└─────┴──────────────────────┴──────────────────────┘

total_visits is aggregated with sum (100 + 200 = 300), while unique_users is aggregated with max (max(5, 8) = 8).

Blog: Using Aggregate Combinators in ClickHouse

SQL Reference

Data Types

Engines

Functions

Formats

Settings

System Tables

Data Lakes

Creating a table

SELECT and INSERT

Example of an aggregated materialized view

Tuple element aggregation

​Creating a table

​SELECT and INSERT

​Example of an aggregated materialized view

​Tuple element aggregation

​Related content

Creating a table

SELECT and INSERT

Example of an aggregated materialized view

Tuple element aggregation

Related content