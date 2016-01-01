hashed dictionary layout types

The dictionary is completely stored in memory in the form of a hash table. The dictionary can contain any number of elements with any identifiers. In practice, the number of keys can reach tens of millions of items.

The dictionary key has the UInt64 type.

All types of sources are supported. When updating, data (from a file or from a table) is read in its entirety.

Configuration example:

DDL

Configuration file LAYOUT(HASHED()) <layout> <hashed /> </layout>

Configuration example with settings:

DDL

Configuration file LAYOUT(HASHED([SHARDS 1] [SHARD_LOAD_QUEUE_BACKLOG 10000] [MAX_LOAD_FACTOR 0.5])) <layout> <hashed> <!-- If shards greater then 1 (default is `1`) the dictionary will load data in parallel, useful if you have huge amount of elements in one dictionary. --> <shards>10</shards> <!-- Size of the backlog for blocks in parallel queue. Since the bottleneck in parallel loading is rehash, and so to avoid stalling because of thread is doing rehash, you need to have some backlog. 10000 is good balance between memory and speed. Even for 10e10 elements and can handle all the load without starvation. --> <shard_load_queue_backlog>10000</shard_load_queue_backlog> <!-- Maximum load factor of the hash table, with greater values, the memory is utilized more efficiently (less memory is wasted) but read/performance may deteriorate. Valid values: [0.5, 0.99] Default: 0.5 --> <max_load_factor>0.5</max_load_factor> </hashed> </layout>

Similar to hashed , but uses less memory in favor more CPU usage.

The dictionary key has the UInt64 type.

Configuration example:

DDL

Configuration file LAYOUT(SPARSE_HASHED([SHARDS 1] [SHARD_LOAD_QUEUE_BACKLOG 10000] [MAX_LOAD_FACTOR 0.5])) <layout> <sparse_hashed> <!-- <shards>1</shards> --> <!-- <shard_load_queue_backlog>10000</shard_load_queue_backlog> --> <!-- <max_load_factor>0.5</max_load_factor> --> </sparse_hashed> </layout>

It is also possible to use shards for this type of dictionary, and again it is more important for sparse_hashed then for hashed , since sparse_hashed is slower.

This type of storage is for use with composite keys. Similar to hashed .

Configuration example:

DDL

Configuration file LAYOUT(COMPLEX_KEY_HASHED([SHARDS 1] [SHARD_LOAD_QUEUE_BACKLOG 10000] [MAX_LOAD_FACTOR 0.5])) <layout> <complex_key_hashed> <!-- <shards>1</shards> --> <!-- <shard_load_queue_backlog>10000</shard_load_queue_backlog> --> <!-- <max_load_factor>0.5</max_load_factor> --> </complex_key_hashed> </layout>

This type of storage is for use with composite keys. Similar to sparse_hashed.

Configuration example: