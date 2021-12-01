uniqCombined
Calculates the approximate number of different argument values.
uniqCombined(HLL_precision)(x[, ...])
The
uniqCombined function is a good choice for calculating the number of different values.
Arguments
The function takes a variable number of parameters. Parameters can be
Tuple,
Array,
Date,
DateTime,
String, or numeric types.
HLL_precision is the base-2 logarithm of the number of cells in HyperLogLog. Optional, you can use the function as
uniqCombined(x[, ...]). The default value for
HLL_precision is 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).
Returned value
- A number UInt64-type number.
Implementation details
Function:
Calculates a hash (64-bit hash for
Stringand 32-bit otherwise) for all parameters in the aggregate, then uses it in calculations.
Uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table.
For a small number of distinct elements, an array is used. When the set size is larger, a hash table is used. For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
Provides the result deterministically (it does not depend on the query processing order).
note
Since it uses 32-bit hash for non-
String type, the result will have very high error for cardinalities significantly larger than
UINT_MAX (error will raise quickly after a few tens of billions of distinct values), hence in this case you should use uniqCombined64
Compared to the uniq function, the
uniqCombined:
- Consumes several times less memory.
- Calculates with several times higher accuracy.
- Usually has slightly lower performance. In some scenarios,
uniqCombinedcan perform better than
uniq, for example, with distributed queries that transmit a large number of aggregation states over the network.
See Also