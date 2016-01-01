Build Clickhouse with DEFLATE_QPL

Make sure your target machine meet the QPL required Prerequisites

Pass the following flag to CMake when building ClickHouse, depending on the capabilities of your target machine:

cmake -DENABLE_AVX2 = 1 -DENABLE_QPL = 1 ..



or

cmake -DENABLE_AVX512 = 1 -DENABLE_QPL = 1 ..



For generic requirements, please refer to Clickhouse generic build instructions

Run Benchmark with DEFLATE_QPL

The folders benchmark_sample under qpl-cmake give example to run benchmark with python scripts:

client_scripts contains python scripts for running typical benchmark, for example:

client_stressing_test.py : The python script for query stress test with [1~4] server instances.

: The python script for query stress test with [1~4] server instances. queries_ssb.sql : The file lists all queries for Star Schema Benchmark

: The file lists all queries for Star Schema Benchmark allin1_ssb.sh : This shell script executes benchmark workflow all in one automatically.

database_files means it will store database files according to lz4/deflate/zstd codec.

$ cd ./benchmark_sample/client_scripts

$ sh run_ssb.sh



After complete, please check all the results in this folder: ./output/

In case you run into failure, please manually run benchmark as below sections.

[CLICKHOUSE_EXE] means the path of clickhouse executable program.

CPU: Sapphire Rapid

OS Requirements refer to System Requirements for QPL

IAA Setup refer to Accelerator Configuration

Install python modules:

pip3 install clickhouse_driver numpy



[Self-check for IAA]

$ accel-config list | grep -P 'iax|state'



Expected output like this:

"dev" : "iax1" ,

"state" : "enabled" ,

"state" : "enabled" ,



If you see nothing output, it means IAA is not ready to work. Please check IAA setup again.

$ cd ./benchmark_sample

$ mkdir rawdata_dir && cd rawdata_dir



Use dbgen to generate 100 million rows data with the parameters: -s 20

The files like *.tbl are expected to output under ./benchmark_sample/rawdata_dir/ssb-dbgen :

Set up database with LZ4 codec

$ cd ./database_dir/lz4

$ [ CLICKHOUSE_EXE ] server -C config_lz4.xml >& /dev/null &

$ [ CLICKHOUSE_EXE ] client



Here you should see the message Connected to ClickHouse server from console which means client successfully setup connection with server.

Complete below three steps mentioned in Star Schema Benchmark

Creating tables in ClickHouse

Inserting data. Here should use ./benchmark_sample/rawdata_dir/ssb-dbgen/*.tbl as input data.

as input data. Converting “star schema” to de-normalized “flat schema”

Set up database with with IAA Deflate codec

$ cd ./database_dir/deflate

$ [ CLICKHOUSE_EXE ] server -C config_deflate.xml >& /dev/null &

$ [ CLICKHOUSE_EXE ] client



Complete three steps same as lz4 above

Set up database with with ZSTD codec

$ cd ./database_dir/zstd

$ [ CLICKHOUSE_EXE ] server -C config_zstd.xml >& /dev/null &

$ [ CLICKHOUSE_EXE ] client



Complete three steps same as lz4 above

[self-check] For each codec(lz4/zstd/deflate), please execute below query to make sure the databases are created successfully:

select count ( ) from lineorder_flat



You are expected to see below output:

┌─── count ( ) ─┐

│ 119994608 │

└───────────┘



[Self-check for IAA Deflate codec] At the first time you execute insertion or query from client, clickhouse server console is expected to print this log:

Hardware-assisted DeflateQpl codec is ready!



If you never find this, but see another log as below:

Initialization of hardware-assisted DeflateQpl codec failed



That means IAA devices is not ready, you need check IAA setup again.

Before start benchmark, Please disable C6 and set CPU frequency governor to be performance

$ cpupower idle-set -d 3

$ cpupower frequency-set -g performance



To eliminate impact of memory bound on cross sockets, we use numactl to bind server on one socket and client on another socket.

to bind server on one socket and client on another socket. Single instance means single server connected with single client

Now run benchmark for LZ4/Deflate/ZSTD respectively:

LZ4:

$ cd ./database_dir/lz4

$ numactl -m 0 -N 0 [ CLICKHOUSE_EXE ] server -C config_lz4.xml >& /dev/null &

$ cd ./client_scripts

$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 1 > lz4.log



IAA deflate:

$ cd ./database_dir/deflate

$ numactl -m 0 -N 0 [ CLICKHOUSE_EXE ] server -C config_deflate.xml >& /dev/null &

$ cd ./client_scripts

$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 1 > deflate.log



ZSTD:

$ cd ./database_dir/zstd

$ numactl -m 0 -N 0 [ CLICKHOUSE_EXE ] server -C config_zstd.xml >& /dev/null &

$ cd ./client_scripts

$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 1 > zstd.log



Now three logs should be output as expected:

lz4.log

deflate.log

zstd.log



How to check performance metrics:

We focus on QPS, please search the keyword: QPS_Final and collect statistics

To reduce impact of memory bound on too much threads, We recommend run benchmark with multi-instances.

Multi-instance means multiple（2 or 4）servers connected with respective client.

The cores of one socket need to be divided equally and assigned to the servers respectively.

For multi-instances, must create new folder for each codec and insert dataset by following the similar steps as single instance.

There are 2 differences:

For client side, you need launch clickhouse with the assigned port during table creation and data insertion.

For server side, you need launch clickhouse with the specific xml config file in which port has been assigned. All customized xml config files for multi-instances has been provided under ./server_config.

Here we assume there are 60 cores per socket and take 2 instances for example. Launch server for first instance LZ4:

$ cd ./database_dir/lz4

$ numactl -C 0 -29,120-149 [ CLICKHOUSE_EXE ] server -C config_lz4.xml >& /dev/null &



ZSTD:

$ cd ./database_dir/zstd

$ numactl -C 0 -29,120-149 [ CLICKHOUSE_EXE ] server -C config_zstd.xml >& /dev/null &



IAA Deflate:

$ cd ./database_dir/deflate

$ numactl -C 0 -29,120-149 [ CLICKHOUSE_EXE ] server -C config_deflate.xml >& /dev/null &



[Launch server for second instance]

LZ4:

$ cd ./database_dir && mkdir lz4_s2 && cd lz4_s2

$ cp .. / .. /server_config/config_lz4_s2.xml ./

$ numactl -C 30 -59,150-179 [ CLICKHOUSE_EXE ] server -C config_lz4_s2.xml >& /dev/null &



ZSTD:

$ cd ./database_dir && mkdir zstd_s2 && cd zstd_s2

$ cp .. / .. /server_config/config_zstd_s2.xml ./

$ numactl -C 30 -59,150-179 [ CLICKHOUSE_EXE ] server -C config_zstd_s2.xml >& /dev/null &



IAA Deflate:

$ cd ./database_dir && mkdir deflate_s2 && cd deflate_s2

$ cp .. / .. /server_config/config_deflate_s2.xml ./

$ numactl -C 30 -59,150-179 [ CLICKHOUSE_EXE ] server -C config_deflate_s2.xml >& /dev/null &



Creating tables && Inserting data for second instance

Creating tables:

$ [ CLICKHOUSE_EXE ] client -m --port = 9001



Inserting data:

$ [ CLICKHOUSE_EXE ] client --query "INSERT INTO [TBL_FILE_NAME] FORMAT CSV" < [ TBL_FILE_NAME ] .tbl --port = 9001



[TBL_FILE_NAME] represents the name of a file named with the regular expression: *. tbl under ./benchmark_sample/rawdata_dir/ssb-dbgen .

. --port=9001 stands for the assigned port for server instance which is also defined in config_lz4_s2.xml/config_zstd_s2.xml/config_deflate_s2.xml. For even more instances, you need replace it with the value: 9002/9003 which stand for s3/s4 instance respectively. If you don't assign it, the port is 9000 by default which has been used by first instance.

Benchmarking with 2 instances

LZ4:

$ cd ./database_dir/lz4

$ numactl -C 0 -29,120-149 [ CLICKHOUSE_EXE ] server -C config_lz4.xml >& /dev/null &

$ cd ./database_dir/lz4_s2

$ numactl -C 30 -59,150-179 [ CLICKHOUSE_EXE ] server -C config_lz4_s2.xml >& /dev/null &

$ cd ./client_scripts

$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 2 > lz4_2insts.log



ZSTD:

$ cd ./database_dir/zstd

$ numactl -C 0 -29,120-149 [ CLICKHOUSE_EXE ] server -C config_zstd.xml >& /dev/null &

$ cd ./database_dir/zstd_s2

$ numactl -C 30 -59,150-179 [ CLICKHOUSE_EXE ] server -C config_zstd_s2.xml >& /dev/null &

$ cd ./client_scripts

$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 2 > zstd_2insts.log



IAA deflate

$ cd ./database_dir/deflate

$ numactl -C 0 -29,120-149 [ CLICKHOUSE_EXE ] server -C config_deflate.xml >& /dev/null &

$ cd ./database_dir/deflate_s2

$ numactl -C 30 -59,150-179 [ CLICKHOUSE_EXE ] server -C config_deflate_s2.xml >& /dev/null &

$ cd ./client_scripts

$ numactl -m 1 -N 1 python3 client_stressing_test.py queries_ssb.sql 2 > deflate_2insts.log



Here the last argument: 2 of client_stressing_test.py stands for the number of instances. For more instances, you need replace it with the value: 3 or 4. This script support up to 4 instances/

Now three logs should be output as expected:

lz4_2insts.log

deflate_2insts.log

zstd_2insts.log



How to check performance metrics:

We focus on QPS, please search the keyword: QPS_Final and collect statistics

Benchmark setup for 4 instances is similar with 2 instances above. We recommend use 2 instances benchmark data as final report for review.

Each time before launch new clickhouse server, please make sure no background clickhouse process running, please check and kill old one:

$ ps -aux | grep clickhouse

$ kill -9 [ PID ]



By comparing the query list in ./client_scripts/queries_ssb.sql with official Star Schema Benchmark, you will find 3 queries are not included: Q1.2/Q1.3/Q3.4 . This is because cpu utilization% is very low <10% for these queries which means cannot demonstrate performance differences.