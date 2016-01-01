TPC-DS (2012)

Similar to the Star Schema Benchmark (SSB), TPC-DS is based on TPC-H, but it took the opposite route, i.e. it expanded the number of joins needed by storing the data in a complex snowflake schema (24 instead of 8 tables). The data distribution is skewed (e.g. normal and Poisson distributions). It includes 99 reporting and ad-hoc queries with random substitutions.

First, checkout the TPC-DS repository and compile the data generator:

Then, generate the data. Parameter -scale specifies the scale factor.

Then, generate the queries (use the same scale factor):

Now create tables in ClickHouse. You can either use the original table definitions in tools/tpcds.sql or "tuned" table definitions with properly defined primary key indexes and LowCardinality-type column types where it makes sense.

The data can be imported as follows:

Then run the generated queries.