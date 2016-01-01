uniqCombined64
Calculates the approximate number of different argument values. It is the same as uniqCombined, but uses a 64-bit hash for all data types rather than just for the String data type.
Parameters
HLL_precision: The base-2 logarithm of the number of cells in HyperLogLog. Optionally, you can use the function as
uniqCombined64(x[, ...]). The default value for
HLL_precisionis 17, which is effectively 96 KiB of space (2^17 cells, 6 bits each).
X: A variable number of parameters. Parameters can be
Tuple,
Array,
Date,
DateTime,
String, or numeric types.
Returned value
- A number UInt64-type number.
Implementation details
The
uniqCombined64 function:
- Calculates a hash (64-bit hash for all data types) for all parameters in the aggregate, then uses it in calculations.
- Uses a combination of three algorithms: array, hash table, and HyperLogLog with an error correction table.
- For a small number of distinct elements, an array is used.
- When the set size is larger, a hash table is used.
- For a larger number of elements, HyperLogLog is used, which will occupy a fixed amount of memory.
- Provides the result deterministically (it does not depend on the query processing order).
Since it uses 64-bit hash for all types, the result does not suffer from very high error for cardinalities significantly larger than
UINT_MAX like uniqCombined does, which uses a 32-bit hash for non-
String types.
Compared to the uniq function, the
uniqCombined64 function:
- Consumes several times less memory.
- Calculates with several times higher accuracy.
Example
In the example below
uniqCombined64 is run on
1e10 different numbers returning a very close approximation of the number of different argument values.
Query:
Result:
By comparison the
uniqCombined function returns a rather poor approximation for an input this size.
Query:
Result:
