topK

Introduced in: v1.1.0

Returns an array of the approximately most frequent values in the specified column. The resulting array is sorted in descending order of approximate frequency of values (not by the values themselves).

Implements the Filtered Space-Saving algorithm for analyzing TopK, based on the reduce-and-combine algorithm from Parallel Space Saving.

This function does not provide a guaranteed result. In certain situations, errors might occur and it might return frequent values that aren't the most frequent values.

See Also

Syntax

topK(N)(column)
topK(N, load_factor)(column)
topK(N, load_factor, 'counts')(column)

Parameters

N — The number of elements to return. Default value: 10. Maximum value of N = 65536. UInt64
load_factor — Optional. Defines, how many cells reserved for values. If uniq(column) > N * load_factor, result of topK function will be approximate. Default value: 3. UInt64
counts — Optional. Defines whether the result should contain an approximate count and error value. Bool

Arguments

column — The name of the column for which to find the most frequent values. String

Returned value

Returns an array of the approximately most frequent values, sorted in descending order of approximate frequency. Array

Examples

Usage example

SELECT topK(3)(AirlineID) AS res
FROM ontime;

┌─res─────────────────┐
│ [19393,19790,19805] │
└─────────────────────┘

See Also

topK​

topK