- 函数
- Aggregate functions
- Combinator examples
- avgMergeState
avgMergeState
描述
MergeState 组合器可以应用于 avg 函数,以合并类型为 AverageFunction(avg, T) 的部分聚合状态,并返回一个新的中间聚合状态。
示例用法
MergeState 组合器在多层级聚合场景中特别有用,在这些场景中,您希望组合预聚合状态并将其维护为状态(而不是最终状态)以进行进一步处理。为了说明这一点,我们将看一个示例,其中将单个服务器性能指标转换为跨多个级别的层次聚合:服务器级别 → 区域级别 → 数据中心级别。
首先,我们创建一个表来存储原始数据:
CREATE TABLE raw_server_metrics
(
timestamp DateTime DEFAULT now(),
server_id UInt32,
region String,
datacenter String,
response_time_ms UInt32
)
ENGINE = MergeTree()
ORDER BY (region, server_id, timestamp);
我们将创建一个服务器级别的聚合目标表,并定义一个增量物化视图作为其插入触发器:
CREATE TABLE server_performance
(
server_id UInt32,
region String,
datacenter String,
avg_response_time AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (region, server_id);
CREATE MATERIALIZED VIEW server_performance_mv
TO server_performance
AS SELECT
server_id,
region,
datacenter,
avgState(response_time_ms) AS avg_response_time
FROM raw_server_metrics
GROUP BY server_id, region, datacenter;
我们将在区域和数据中心级别执行相同的操作:
CREATE TABLE region_performance
(
region String,
datacenter String,
avg_response_time AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY (datacenter, region);
CREATE MATERIALIZED VIEW region_performance_mv
TO region_performance
AS SELECT
region,
datacenter,
avgMergeState(avg_response_time) AS avg_response_time
FROM server_performance
GROUP BY region, datacenter;
-- datacenter level table and materialized view
CREATE TABLE datacenter_performance
(
datacenter String,
avg_response_time AggregateFunction(avg, UInt32)
)
ENGINE = AggregatingMergeTree()
ORDER BY datacenter;
CREATE MATERIALIZED VIEW datacenter_performance_mv
TO datacenter_performance
AS SELECT
datacenter,
avgMergeState(avg_response_time) AS avg_response_time
FROM region_performance
GROUP BY datacenter;
然后我们将向源表插入示例原始数据:
INSERT INTO raw_server_metrics (timestamp, server_id, region, datacenter, response_time_ms) VALUES
(now(), 101, 'us-east', 'dc1', 120),
(now(), 101, 'us-east', 'dc1', 130),
(now(), 102, 'us-east', 'dc1', 115),
(now(), 201, 'us-west', 'dc1', 95),
(now(), 202, 'us-west', 'dc1', 105),
(now(), 301, 'eu-central', 'dc2', 145),
(now(), 302, 'eu-central', 'dc2', 155);
我们将为每个级别编写三个查询:
- 服务级别
- 区域级别
- 数据中心级别
SELECT
server_id,
region,
avgMerge(avg_response_time) AS avg_response_ms
FROM server_performance
GROUP BY server_id, region
ORDER BY region, server_id;
┌─server_id─┬─region─────┬─avg_response_ms─┐
│ 301 │ eu-central │ 145 │
│ 302 │ eu-central │ 155 │
│ 101 │ us-east │ 125 │
│ 102 │ us-east │ 115 │
│ 201 │ us-west │ 95 │
│ 202 │ us-west │ 105 │
└───────────┴────────────┴─────────────────┘
SELECT
region,
datacenter,
avgMerge(avg_response_time) AS avg_response_ms
FROM region_performance
GROUP BY region, datacenter
ORDER BY datacenter, region;
┌─region─────┬─datacenter─┬────avg_response_ms─┐
│ us-east │ dc1 │ 121.66666666666667 │
│ us-west │ dc1 │ 100 │
│ eu-central │ dc2 │ 150 │
└────────────┴────────────┴────────────────────┘
SELECT
datacenter,
avgMerge(avg_response_time) AS avg_response_ms
FROM datacenter_performance
GROUP BY datacenter
ORDER BY datacenter;
┌─datacenter─┬─avg_response_ms─┐
│ dc1 │ 113 │
│ dc2 │ 150 │
└────────────┴─────────────────┘
我们可以插入更多数据:
INSERT INTO raw_server_metrics (timestamp, server_id, region, datacenter, response_time_ms) VALUES
(now(), 101, 'us-east', 'dc1', 140),
(now(), 201, 'us-west', 'dc1', 85),
(now(), 301, 'eu-central', 'dc2', 135);
让我们再次检查数据中心级别的性能。注意整个聚合链是如何自动更新的:
SELECT
datacenter,
avgMerge(avg_response_time) AS avg_response_ms
FROM datacenter_performance
GROUP BY datacenter
ORDER BY datacenter;
┌─datacenter─┬────avg_response_ms─┐
│ dc1 │ 112.85714285714286 │
│ dc2 │ 145 │
└────────────┴────────────────────┘