Skip to main content

· 2 min read

Parameterized views can be handy to slice and dice data on the fly based on some parameters that can be fed at query execution time.

See this basic example:

1) create a table

clickhouse-cloud :) CREATE TABLE raw_data (id UInt32, data String) ENGINE = MergeTree ORDER BY id

CREATE TABLE raw_data
(
`id` UInt32,
`data` String
)
ENGINE = MergeTree
ORDER BY id

Query id: aa21e614-1e10-4bba-88ce-4c7183a9148e

Ok.

0 rows in set. Elapsed: 0.332 sec.

2) insert some sample random data

clickhouse-cloud :) INSERT INTO raw_data SELECT * FROM generateRandom('`id` UInt32,
`data` String',1,1) LIMIT 1000000;

INSERT INTO raw_data SELECT *
FROM generateRandom('`id` UInt32,
`data` String', 1, 1)
LIMIT 1000000

Query id: c552a34a-b72f-45e1-bed0-778923e1b5c9

Ok.

0 rows in set. Elapsed: 0.438 sec. Processed 1.05 million rows, 10.99 MB (2.39 million rows/s., 25.11 MB/s.)

3) create the parameterized view:

clickhouse-cloud :) CREATE VIEW raw_data_parametrized AS SELECT * FROM raw_data WHERE id BETWEEN {id_from:UInt32} AND {id_to:UInt32}

CREATE VIEW raw_data_parametrized AS
SELECT *
FROM raw_data
WHERE (id >= {id_from:UInt32}) AND (id <= {id_to:UInt32})

Query id: 45fb83a6-aa55-4197-a7cd-9e1ad2c76d48

Ok.

0 rows in set. Elapsed: 0.102 sec.

4) query the parameterized view by feeding the expected parameters in your FROM clause:

clickhouse-cloud :) SELECT count() FROM raw_data_parametrized(id_from=0, id_to=50000);

SELECT count()
FROM raw_data_parametrized(id_from = 0, id_to = 50000)

Query id: 5731aae1-3e68-4e63-b57f-d50f29055744

┌─count()─┐
317019
└─────────┘

1 row in set. Elapsed: 0.004 sec. Processed 319.49 thousand rows, 319.49 KB (76.29 million rows/s., 76.29 MB/s.)

For more info, please refer to https://clickhouse.com/docs/en/sql-reference/statements/create/view#parameterized-view

· One min read

Question

When executing a INSERT...SELECT statement, I am getting too many parts (TOO_MANY_PARTS) error.

How can I solve this?

Answer

Below are some of the settings to tune to avoid this error, this is expert level tuning of ClickHouse and these values should be set only after understanding the specifications of the ClickHouse cloud service or on-prem cluster where these will be used, so do not take these values as "one size fits all".

max_insert_block_size = 100_000_000 (default 1_048_576)

Increase from ~1M to 100M would allow larger blocks to form

Note: This setting only applies when the server forms the blocks. i.e. INSERT via the HTTP interface, and not for clickhouse-client

min_insert_block_size_rows = 100_000_000 (default 1_048_576)

Increase from ~1M to 100M would allow larger blocks to form.

min_insert_block_size_bytes = 500_000_000 (default 268_435_456)

Increase from 268.44 MB to 500 MB would allow larger blocks to form.

parts_to_delay_insert = 500 (default 150)

Increasing this so that INSERTs are not artificially slowed down when the number of active parts in a single partition is reached.

parts_to_throw_insert = 1500 (default 300)

Increasing this would generally affect query performance to the table, but this would be fine for data migration.

· One min read

Question

How to create a ClickHouse dictionary using string keys and string values from a MergeTree table source

Answer

  • Create the source table for the dictionary
CREATE TABLE db1.table1_dict_source
(
id UInt32,
email String,
name String
)
ENGINE = MergeTree()
ORDER BY id;
  • Insert rows
INSERT INTO db1.table1_dict_source
(id, email, name)
VALUES
(1, '[email protected]', 'me'),
(2, '[email protected]', 'you');
  • Create dictionary with key/value both as String
CREATE DICTIONARY db1.table1_dict
(
email String,
name String
)
PRIMARY KEY email
SOURCE(
CLICKHOUSE(
TABLE 'table1_dict_source'
USER 'default'
PASSWORD 'ClickHouse123!'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 0 MAX 1000);
  • Test the dictionary
clickhouse-cloud :) SELECT * from db1.table1_dict;

SELECT *
FROM db1.table1_dict

Query id: 098396ce-11dd-4c71-a0e1-40723dd67ddc

┌─email──────────┬─name─┐
[email protected] │ me │
[email protected] │ you │
└────────────────┴──────┘

2 rows in set. Elapsed: 0.001 sec.

You can also use dictGet function to retrieve values from it such as:

SELECT dictGet('db1.table1_dict', 'name', '[email protected]');

Response:

┌─dictGet('db1.table1_dict', 'name', '[email protected]')─┐
│ me │
└─────────────────────────────────────────────────────┘

More details - https://clickhouse.com/docs/en/sql-reference/functions/ext-dict-functions

· 2 min read

Question

I see other vendors providing their own builds of ClickHouse. What is the difference between official ClickHouse builds and these 3rd-party builds?

Answer

Here are some of the differences we have observed with other builds:

  • The strings "official" are replaced with the name of the vendor
  • They appear after several months of delay and don't include recent bug fixes, which means these builds can contain vulnerabilities that have been fixed in the official versions
  • The builds are not bit-identical, and the addresses in the code are different. As a result, stack traces from these builds cannot be analyzed, and the ClickHouse team cannot answer questions about these builds
  • The builds are not auditable or reproducible - there is no publicly accessible CI system with the same build logs
  • The ClickHouse test suite is not run on these builds, so they are not verified to work by the test suite
  • They might not be available for all architectures (like ARM, etc.)
  • Sometimes they include patches targeted for one particular customer that can break compatibility and introduce extra risk

We recommend running the latest version of ClickHouse using the official builds following the install instructions in the documentation:

  • We release a stable version every month, and three latest stable releases are supported in terms of diagnostics and backporting of bug fixes.
  • We also release a long-term support (LTS) version twice a year that is supported for a year after its initial release, which is really only meant for companies that do not allow for frequent upgrades or using non-LTS software. (We are big fans of the monthly stable builds!)

We have more details between stable vs. LTS releases in the docs.

· 7 min read

See this example using clickhouse client and ClickHouse Cloud service.

create a query_cache_test table

Using clickhouse client

clickhouse-cloud :) CREATE TABLE query_cache_test (name String, age UInt8) ENGINE =MergeTree ORDER BY name

CREATE TABLE query_cache_test
(
`name` String,
`age` UInt8
)
ENGINE = MergeTree
ORDER BY name

Query id: 81c54f09-7de4-48ec-916f-c7c304a46931

Ok.

0 rows in set. Elapsed: 0.343 sec.

fill the table with some data:

clickhouse-cloud :) INSERT INTO query_cache_test SELECT * FROM generateRandom('name String, age UInt8',1,1) LIMIT 100000;

INSERT INTO query_cache_test SELECT *
FROM generateRandom('name String, age UInt8', 1, 1)
LIMIT 100000

Query id: 90369105-bd67-494c-bdaf-d90dbfb6def9

Ok.

0 rows in set. Elapsed: 0.173 sec. Processed 327.05 thousand rows, 3.43 MB (1.89 million rows/s., 19.86 MB/s.)

enable trace logs:

clickhouse-cloud :) SET send_logs_level = 'trace'

SET send_logs_level = 'trace'

Query id: d65490b0-7960-4a85-a343-787e70e5e293

Ok.

0 rows in set. Elapsed: 0.134 sec.

run a query asking to make use of query cache (appending SETTINGS use_query_cache=true to the query):

clickhouse-cloud :) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true;

SELECT name
FROM query_cache_test
WHERE age > 1000
FORMAT `Null`
SETTINGS use_query_cache = 1

Query id: 3754a7fd-b786-47c1-a258-dfbc75e35a04

[c-red-qc-36-server-0] 2023.05.29 12:06:10.542408 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> executeQuery: (from 151.53.3.113:50412, user: tony) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true; (stage: Complete)
[c-red-qc-36-server-0] 2023.05.29 12:06:10.542744 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:10.542900 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543020 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543164 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543226 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> InterpreterSelectQuery: FetchColumns -> Complete
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543337 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Key condition: unknown
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543395 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 12/12 marks by primary key, 12 marks to read from 1 ranges
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543412 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Spreading mark ranges among streams (default reading)
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543461 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> MergeTreeBaseSelectProcessor: PREWHERE condition was split into 1 steps: "greater(age, 1000)"
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543484 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> MergeTreeInOrderSelectProcessor: Reading 1 ranges in order from part all_0_0_0, approx. 100000 rows starting from 0
[c-red-qc-36-server-0] 2023.05.29 12:06:10.543559 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: No entry found for query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547760 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: Stored result of query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547827 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> executeQuery: Read 100000 rows, 97.66 KiB in 0.005508 sec., 18155410.31227306 rows/sec., 17.31 MiB/sec.
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547913 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> MemoryTracker: Peak memory usage (for query): 451.89 KiB.
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547933 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Debug\> TCPHandler: Processed in 0.005911032 sec.
Ok.

0 rows in set. Elapsed: 0.006 sec. Processed 100.00 thousand rows, 100.00 KB (17.56 million rows/s., 17.56 MB/s.)

run the same query again:

clickhouse-cloud :) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true;

SELECT name
FROM query_cache_test
WHERE age > 1000
FORMAT `Null`
SETTINGS use_query_cache = 1

Query id: a047527c-9d55-4e6b-9747-0ccad8787515

[c-red-qc-36-server-0] 2023.05.29 12:06:17.931007 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> executeQuery: (from 151.53.3.113:50412, user: tony) SELECT name FROM query_cache_test WHERE age > 1000 FORMAT Null SETTINGS use_query_cache=true; (stage: Complete)
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931331 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931468 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> InterpreterSelectQuery: MergeTreeWhereOptimizer: condition "age > 1000" moved to PREWHERE
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931585 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931696 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> ContextAccess (tony): Access granted: SELECT(name, age) ON tony.test
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931749 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> InterpreterSelectQuery: FetchColumns -> Complete
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931857 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Key condition: unknown
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931891 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Selected 1/1 parts by partition key, 1 parts by primary key, 12/12 marks by primary key, 12 marks to read from 1 ranges
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931913 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> tony.test (e61a107c-e7f8-4445-825f-88f85c72f7e9) (SelectExecutor): Spreading mark ranges among streams (default reading)
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931952 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> MergeTreeBaseSelectProcessor: PREWHERE condition was split into 1 steps: "greater(age, 1000)"
[c-red-qc-36-server-0] 2023.05.29 12:06:17.931975 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> MergeTreeInOrderSelectProcessor: Reading 1 ranges in order from part all_0_0_0, approx. 100000 rows starting from 0
[c-red-qc-36-server-0] 2023.05.29 12:06:17.932043 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> QueryCache: Entry found for query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:17.932551 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> MemoryTracker: Peak memory usage (for query): 5.19 KiB.
[c-red-qc-36-server-0] 2023.05.29 12:06:17.932581 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Debug\> TCPHandler: Processed in 0.001961411 sec.
Ok.

0 rows in set. Elapsed: 0.002 sec.

Now observe the differences in the TRACE logs related to QueryCache between,

1st execution:

[c-red-qc-36-server-0] 2023.05.29 12:06:10.543559 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: No entry found for query SELECT name FROM query_cache_test  WHERE age > 1000 FORMAT `Null` SETTINGS
[c-red-qc-36-server-0] 2023.05.29 12:06:10.547760 [ 454 ] {3754a7fd-b786-47c1-a258-dfbc75e35a04} \<Trace\> QueryCache: Stored result of query SELECT name FROM query_cache_test WHERE age > 1000 FORMAT `Null` SETTINGS

at 2nd execution:

[c-red-qc-36-server-0] 2023.05.29 12:06:17.932043 [ 454 ] {a047527c-9d55-4e6b-9747-0ccad8787515} \<Trace\> QueryCache: Entry found for query SELECT name FROM query_cache_test  WHERE age > 1000 FORMAT `Null` SETTINGS

In the 1st execution, no entry was obviously found (No entry found for query SELECT...), so ClickHouse did store (Stored result of query SELECT...) the entry for us.

In the 2nd execution, they query made use of they query cache as it found the entry already stored (Entry found for query SELECT...).

Using just SQL

Just through issuing SQL commands without inspecting the clickhouse client trace logs,

it is also possible to validate if query cache is being used by checking the relevant system tables:

clickhouse-cloud :) SELECT 1 SETTINGS use_query_cache=true;

SELECT 1
SETTINGS use_query_cache = 1

Query id: a5a078c7-61e5-4036-a6f0-4d602d5b72d2

┌─1─┐
1
└───┘

1 row in set. Elapsed: 0.001 sec.

clickhouse-cloud :) SELECT 1 SETTINGS use_query_cache=true;

SELECT 1
SETTINGS use_query_cache = 1

Query id: 322ae001-b1ab-463f-ac8d-dc5ba346f3f9

┌─1─┐
1
└───┘

1 row in set. Elapsed: 0.001 sec.

clickhouse-cloud :) SELECT * FROM clusterAllReplicas(default,system.query_cache);

SELECT *
FROM clusterAllReplicas(default, system.query_cache)

Query id: c9b57eac-ba64-430e-8d51-8f865a13cc25

┌─query──────────────┬─result_size─┬─stale─┬─shared─┬─compressed─┬──────────expires_at─┬─────────────key_hash─┐
SELECT 1 SETTINGS │ 1360112023-08-02 15:08:2312188185624808016954
└────────────────────┴─────────────┴───────┴────────┴────────────┴─────────────────────┴──────────────────────┘

1 row in set. Elapsed: 0.005 sec.

clickhouse-cloud :) SELECT * FROM clusterAllReplicas(default,system.events) WHERE event LIKE 'QueryCache%'

SELECT *
FROM clusterAllReplicas(default, system.events)
WHERE event LIKE 'QueryCache%'

Query id: d536555e-b8ab-4cd4-9741-c04e95612bec

┌─event────────────┬─value─┬─description────────────────────────────────────────────────────────────────────────────────────────────┐
│ QueryCacheHits │ 1 │ Number of times a query result has been found in the query cache (and query computation was avoided).
│ QueryCacheMisses │ 1 │ Number of times a query result has not been found in the query cache (and required query computation).
└──────────────────┴───────┴────────────────────────────────────────────────────────────────────────────────────────────────────────┘

2 rows in set. Elapsed: 0.006 sec.

In the last results we see 1 QueryCacheMisses for the first time the query SELECT 1 SETTINGS use_query_cache=true; ran and a QueryCacheHits event related to the second execution of the query.

Keep also in mind that the default maximum cache entry size is 1048576 bytes (= 1 MiB) and by default results are stored in cache for 60 seconds only (you can use query_cache_ttl=300 in SETTINGS for example to have a query cache result stored for 5 minutes instead).

You can find more detailed info on ClickHouse Query Cache here

· One min read

Question

My query is returning many rows but I'm only interested in the query processing time. How do I omit the query output and check for query processing time?

Answer

Append FORMAT Null to your query to configure the output format to Null. This prevents data from being transmitted to the client.

For example:

SELECT
customer_id,
count() AS total,
any(review_headline)
FROM amazon_reviews
GROUP BY customer_id
ORDER BY total DESC
FORMAT Null

The response will return the number of rows processed and the elapsed time, but 0 rows will be returned:

0 rows in set. Elapsed: 25.288 sec. Processed 222.04 million rows, 13.50 GB (8.78 million rows/s., 533.77 MB/s.)

· One min read

Question

How can a ClickHouse Cloud service be started, stopped and resumed using API endpoints?

Answer

  1. To wake up/resume a Cloud service from an idle state, you can ping the instance:
curl -X GET https://abc123.us-west-2.aws.clickhouse.cloud:8443/ping
  1. To stop a Cloud service, use the /state endpoint along with the stop command. The syntax looks like:
curl -X PATCH https://api.clickhouse.cloud/v1/organizations/<org_uuid>/services/<service_uuid>/state -u <key_id>:<key_secret> -H "Content-Type: application/json" -d ''{"command": "<stop|start>"}''

For example, the following command stops the 2e2124ca-c5ac-459d-a6f2-abc123549d2a service:

curl -X PATCH https://api.clickhouse.cloud/v1/organizations/123abcd0-e9b5-4f55-9e42-0fb04392445c/services/2e2124ca-c5ac-459d-a6f2-abc123549d2a/state -u abc123:ABC123 -H "Content-Type: application/json" -d '{"command": "stop"}'

The output looks like:

{"result":{"id":"2e2124ca-c5ac-459d-a6f2-abc123549d2a","name":"mars-s3","provider":"aws","regionId":"us-west-2","state":"stopping","endpoints":[{"protocol":"nativesecure","host":"abc123.us-west-2.aws.clickhouse.cloud","port":9440},{"protocol":"https","host":"abc123ntrb.us-west-2.aws.clickhouse.cloud","port":8443}],"tier":"production","idleScaling":true,"idleTimeoutMinutes":5,"minTotalMemoryGb":24,"maxTotalMemoryGb":48,"ipAccessList":[{"source":"[0.0.0.0/0](http://0.0.0.0/0)","description":"Anywhere"}],"createdAt":"2022-10-21T18:46:31Z"},"status":200}%
  1. To start the service again, use the start command:
curl -X PATCH https://api.clickhouse.cloud/v1/organizations/123abcd0-e9b5-4f55-9e42-0fb04392445c/services/2e2124ca-c5ac-459d-a6f2-abc123549d2a/state -u abc123:ABC123 -H "Content-Type: application/json" -d '{"command": "start"}'
note

Here are the various states that a service can be in:

"state":"stopping"
"state":"stopped"
"state":"starting"
"state":"running"
"state":"idle"
note

A Cloud service that is "idle" is considered started, so a start command will not resume/wake it up. Use the ping endpoint shown in Step 1 to wake up a service.

· 2 min read

Question

When running ClickHouse in Docker, Docker is complaining about the lack of CAP_IPC_LOCK and CAP_SYS_NICE capabilities in the system. How can I resolve it?

Here is what the no CAP_SYS_NICE or CAP_SYS_NICE capability log messages look like:

docker run -d --name clickhouse-server \
--ulimit nofile=262144:262144 \
--network clickhouse-net \
-p 8123:8123 -p 9000:9000 -p 9009:9009 -p 9363:9363 \
clickhouse/clickhouse-server:23.2
2023.04.19 08:04:10.022720 [ 1 ] {} <Information> Application: It looks like the process has no CAP_IPC_LOCK capability, binary mlock will be disabled. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_ipc_lock=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

2023.04.19 08:04:10.065860 [ 1 ] {} <Information> Application: It looks like the process has no CAP_SYS_NICE capability, the setting 'os_thread_priority' will have no effect. It could happen due to incorrect ClickHouse package installation. You could resolve the problem manually with 'sudo setcap cap_sys_nice=+ep /usr/bin/clickhouse'. Note that it will not work on 'nosuid' mounted filesystems.

Answer

  1. Add two --cap-add arguments to provide the container with the IPC_LOCK and SYS_NICE capabilities:
docker run -d --name clickhouse-server \
--cap-add=SYS_NICE \
--cap-add=IPC_LOCK \
--ulimit nofile=262144:262144 \
--network clickhouse-net \
-p 8123:8123 -p 9000:9000 -p 9009:9009 -p 9363:9363 \
clickhouse/clickhouse-server:23.2
  1. Check that the capabilities are visible in the container using the following command:
apt-get update > /dev/null && apt-get install -y libcap2-bin > /dev/null && capsh --print

The response is similar to:

debconf: delaying package configuration, since apt-utils is not installed
WARNING: libcap needs an update (cap=40 should have a name).
Current: = cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_nice,cap_mknod,cap_audit_write,cap_setfcap+ep
Bounding set =cap_chown,cap_dac_override,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_net_bind_service,cap_net_raw,cap_ipc_lock,cap_sys_chroot,cap_sys_nice,cap_mknod,cap_audit_write,cap_setfcap
Ambient set =
Securebits: 00/0x0/1'b0
secure-noroot: no (unlocked)
secure-no-suid-fixup: no (unlocked)
secure-keep-caps: no (unlocked)
secure-no-ambient-raise: no (unlocked)
uid=0(root) euid=0(root)
gid=0(root)
groups=0(root)
Guessed mode: UNCERTAIN (0)
  1. Manually set both the capabilities for ClickHouse
setcap "cap_ipc_lock=+ep cap_sys_nice=+ep" /usr/bin/clickhouse
  1. Check that the capabilities are applied.
getcap -v /usr/bin/clickhouse

You should see the following:

/usr/bin/clickhouse = cap_ipc_lock,cap_sys_nice+ep
  1. Restart the ClickHouse server and the log messages should not be shown.

Check out this article on Linux capabilities for more details.

· 2 min read

The following useful query shows which of your executed queries used the most memory. A couple of comments about this query:

  • the results are computed from the past day (now() - toIntervalDay(1))) but you can easily modify the time interval
  • it assumes you have a cluster named default, which is the name of your cluster in ClickHouse Cloud. Change default to the name of your cluster
  • if you do not have a cluster, see the query listed at the end of this article
SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
clusterAllReplicas(default, system.query_log)
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

The response looks like:

┌─nb_query─┬─user────┬─query─────────────────────────────────────────────────────────┬───memory─┬─normalized_query_hash─┐
│ 11 │ default │ select version() │ 46178924 │ 7202516440347714159 │
│ 2 │ default │ SELECT * FROM "system"."table_functions" LIMIT 31 OFFSET 0 │ 8391544 │ 12830067173062987695 │
└──────────┴─────────┴───────────────────────────────────────────────────────────────┴──────────┴───────────────────────┘
note

If you do not have a system.query_log table, then you likely do not have query logging enabled. View the details of the query_log setting for details on how to enable it.

If you do not have a cluster, use can just query your one system.query_log table directly:

SELECT
count() as nb_query,
user,
query,
sum(memory_usage) AS memory,
normalized_query_hash
FROM
system.query_log
WHERE
(event_time >= (now() - toIntervalDay(1)))
AND query_kind = 'Select'
AND type = 'QueryFinish'
and user != 'monitoring-internal'
GROUP BY
normalized_query_hash,
query,
user
ORDER BY
memory DESC;

· One min read

We often get asked about a good schema migration tool for ClickHouse and what is the best practice to manage database schemas in ClickHouse that might change over time? There is no standard schema migration tool for ClickHouse, but we have compiled the following list (in no particular order) of automatic schema migration tools with support for ClickHouse that we know: