The functions left and right were previously implemented in parser and now full-featured. Distributed queries with left or right functions without aliases may throw exception if cluster contains different versions of clickhouse-server. If you are upgrading your cluster and encounter this error, you should finish upgrading your cluster to ensure all nodes have the same version. Also you can add aliases (AS something) to the columns in your queries to avoid this issue. #33407 (alexey-milovidov).
Resource usage by scalar subqueries is fully accounted since this version. With this change, rows read in scalar subqueries are now reported in the query_log. If the scalar subquery is cached (repeated or called for several rows) the rows read are only counted once. This change allows KILLing queries and reporting progress while they are executing scalar subqueries. #32271 (Raúl Marín).
Implement data schema inference for input formats. Allow to skip structure (or write just auto) in table functions file, url, s3, hdfs and in parameters of clickhouse-local . Allow to skip structure in create query for table engines File, HDFS, S3, URL, Merge, Buffer, Distributed and ReplicatedMergeTree (if we add new replicas). #32455 (Kruglov Pavel).
Detect format by file extension in file/hdfs/s3/url table functions and HDFS/S3/URL table engines and also for SELECT INTO OUTFILE and INSERT FROM INFILE#33565 (Kruglov Pavel). Close #30918. #33443 (OnePiece).
Automatic cluster discovery via Zoo/Keeper. It allows to add replicas to the cluster without changing configuration on every server. #31442 (vdimir).
Implement hive table engine to access apache hive from clickhouse. This implements: #29245. #31104 (taiyang-li).
Add aggregate functions cramersV, cramersVBiasCorrected, theilsU and contingency. These functions calculate dependency (measure of association) between categorical values. All these functions are using cross-tab (histogram on pairs) for implementation. You can imagine it like a correlation coefficient but for any discrete values (not necessary numbers). #33366 (alexey-milovidov). Initial implementation by Vanyok-All-is-OK and antikvist.
Added table function hdfsCluster which allows processing files from HDFS in parallel from many nodes in a specified cluster, similarly to s3Cluster. #32400 (Zhichang Yu).
Adding support for disks backed by Azure Blob Storage, in a similar way it has been done for disks backed by AWS S3. #31505 (Jakub Kuklis).
Non significant change. In extremely rare cases when data part is lost on every replica, after merging of some data parts, the subsequent queries may skip less amount of partitions during partition pruning. This hardly affects anything. #32220 (Azat Khuzhin).
Improve clickhouse-keeper writing performance by optimization the size calculation logic. #32366 (zhanglistar).
Parallel reading from multiple replicas within a shard during distributed query without using sample key. To enable this, set allow_experimental_parallel_reading_from_replicas = 1 and max_parallel_replicas to any number. This closes #26748. #29279 (Nikita Mikhaylov).
Implemented sparse serialization. It can reduce usage of disk space and improve performance of some queries for columns, which contain a lot of default (zero) values. It can be enabled by setting ratio_for_sparse_serialization. Sparse serialization will be chosen dynamically for column, if it has ratio of number of default values to number of all values above that threshold. Serialization (default or sparse) will be fixed for every column in part, but may varies between parts. #22535 (Anton Popov).
Change ZooKeeper path for zero-copy marks for shared data. Note that "zero-copy replication" is non-production feature (in early stages of development) that you shouldn't use anyway. But in case if you have used it, let you keep in mind this change. #32061 (ianton-ru).
Events clause support for WINDOW VIEW watch query. #32607 (vxider).
Fix ACL with explicit digit hash in clickhouse-keeper: now the behavior consistent with ZooKeeper and generated digest is always accepted. #33249 (小路). #33246.
Fix unexpected projection removal when detaching parts. #32067 (Amos Bird).
Now date time conversion functions that generates time before 1970-01-01 00:00:00 will be saturated to zero instead of overflow. #29953 (Amos Bird). It also fixes a bug in index analysis if date truncation function would yield result before the Unix epoch.
Always display resource usage (total CPU usage, total RAM usage and max RAM usage per host) in client. #33271 (alexey-milovidov).
Improve Bool type serialization and deserialization, check the range of values. #32984 (Kruglov Pavel).
If an invalid setting is defined using the SET query or using the query parameters in the HTTP request, error message will contain suggestions that are similar to the invalid setting string (if any exists). #32946 (Antonio Andelic).
Support hints for mistyped setting names for clickhouse-client and clickhouse-local. Closes #32237. #32841 (凌涛).
Support <secure/> in cluster configuration, as an alternative form of <secure>1</secure>. Close #33270. #33330 (SuperDJY).
Pressing Ctrl+C twice will terminate clickhouse-benchmark immediately without waiting for in-flight queries. This closes #32586. #33303 (alexey-milovidov).
Support Unix timestamp with milliseconds in parseDateTimeBestEffort function. #33276 (Ben).
Allow to cancel query while reading data from external table in the formats: Arrow / Parquet / ORC - it failed to be cancelled it case of big files and setting input_format_allow_seeks as false. Closes #29678. #33238 (Kseniia Sumarokova).
If table engine supports SETTINGS clause, allow to pass the settings as key-value or via config. Add this support for MySQL. #33231 (Kseniia Sumarokova).
Flush all In-Memory data parts when WAL is not enabled while shutdown server or detaching table. #32742 (nauta).
Allow to control connection timeouts for MySQL (previously was supported only for dictionary source). Closes #16669. Previously default connect_timeout was rather small, now it is configurable. #32734 (Kseniia Sumarokova).
Added settings command_read_timeout, command_write_timeout for StorageExecutable, StorageExecutablePool, ExecutableDictionary, ExecutablePoolDictionary, ExecutableUserDefinedFunctions. Setting command_read_timeout controls timeout for reading data from command stdout in milliseconds. Setting command_write_timeout timeout for writing data to command stdin in milliseconds. Added settings command_termination_timeout for ExecutableUserDefinedFunction, ExecutableDictionary, StorageExecutable. Added setting execute_direct for ExecutableUserDefinedFunction, by default true. Added setting execute_direct for ExecutableDictionary, ExecutablePoolDictionary, by default false. #30957 (Maksim Kita).
Bitmap aggregate functions will give correct result for out of range argument instead of wraparound. #33127 (DR).
Bug Fix (user-visible misbehavior in official stable or prestable release)
Several fixes for format parsing. This is relevant if clickhouse-server is open for write access to adversary. Specifically crafted input data for Native format may lead to reading uninitialized memory or crash. This is relevant if clickhouse-server is open for write access to adversary. #33050 (Heena Bansal). Fixed Apache Avro Union type index out of boundary issue in Apache Avro binary format. #33022 (Harry Lee). Fix null pointer dereference in LowCardinality data when deserializing LowCardinality data in the Native format. #33021 (Harry Lee).
ClickHouse Keeper handler will correctly remove operation when response sent. #32988 (JackyWoo).
Potential off-by-one miscalculation of quotas: quota limit was not reached, but the limit was exceeded. This fixes #31174. #31656 (sunny).
Fix an exception Block structure mismatch which may happen during insertion into table with default nested LowCardinality column. Fixes #33028. #33504 (Nikolai Kochetov).
Fix dictionary expressions for range_hashed range min and range max attributes when created using DDL. Closes #30809. #33478 (Maksim Kita).
Fix possible use-after-free for INSERT into Materialized View with concurrent DROP (Azat Khuzhin).
Do not try to read pass EOF (to workaround for a bug in the Linux kernel), this bug can be reproduced on kernels (3.14..5.9), and requires index_granularity_bytes=0 (i.e. turn off adaptive index granularity). #33372 (Azat Khuzhin).
The commands SYSTEM SUSPEND and SYSTEM ... THREAD FUZZER missed access control. It is fixed. Author: Kevin Michel. #33333 (alexey-milovidov).
Fix when COMMENT for dictionaries does not appear in system.tables, system.dictionaries. Allow to modify the comment for Dictionary engine. Closes #33251. #33261 (Maksim Kita).
Add asynchronous inserts (with enabled setting async_insert) to query log. Previously such queries didn't appear in the query log. #33239 (Anton Popov).
The metric StorageBufferBytes sometimes was miscalculated. #33159 (xuyatian).
Fix error Invalid version for SerializationLowCardinality key column in case of reading from LowCardinality column with local_filesystem_read_prefetch or remote_filesystem_read_prefetch enabled. #33046 (Nikolai Kochetov).
Fix throwing exception like positional argument out of bounds for non-positional arguments. Closes #31173#event-5789668239. #32961 (Kseniia Sumarokova).
Fix UB in case of unexpected EOF during filling a set from HTTP query (i.e. if the client interrupted in the middle, i.e. timeout 0.15s curl -Ss -F '[email protected];' 'http://127.0.0.1:8123/?s_structure=key+Int&query=SELECT+dummy+IN+s' and with large enough t.csv). #32955 (Azat Khuzhin).
Fix a regression in replaceRegexpAll function. The function worked incorrectly when matched substring was empty. This closes #32777. This closes #30245. #32945 (alexey-milovidov).
MergeTree table engine might silently skip some mutations if there are too many running mutations or in case of high memory consumption, it's fixed. Fixes #17882. #32814 (tavplubix).
Avoid reusing the scalar subquery cache when processing MV blocks. This fixes a bug when the scalar query reference the source table but it means that all subscalar queries in the MV definition will be calculated for each block. #32811 (Raúl Marín).
Server might fail to start if database with MySQL engine cannot connect to MySQL server, it's fixed. Fixes #14441. #32802 (tavplubix).
Fix error Column is not under aggregate function in case of MV with GROUP BY (list of columns) (which is pared as GROUP BY tuple(...)) over Kafka/RabbitMQ. Fixes #32668 and #32744. #32751 (Nikolai Kochetov).
Fix ALTER TABLE ... MATERIALIZE TTL query with TTL ... DELETE WHERE ... and TTL ... GROUP BY ... modes. #32695 (Anton Popov).
Fix optimize_read_in_order optimization in case when table engine is Distributed or Merge and its underlying MergeTree tables have monotonous function in prefix of sorting key. #32670 (Anton Popov).
Fix LOGICAL_ERROR exception when the target of a materialized view is a JOIN or a SET table. #32669 (Raúl Marín).
Inserting into S3 with multipart upload to Google Cloud Storage may trigger abort. #32504. #32649 (vdimir).
Fix broken select query when there are more than 2 row policies on same column, begin at second queries on the same session. #31606. #32291 (SuperDJY).
Fix fractional unix timestamp conversion to DateTime64, fractional part was reversed for negative unix timestamps (before 1970-01-01). #32240 (Ben).
Some entries of replication queue might hang for temporary_directories_lifetime (1 day by default) with Directory tmp_merge_<part_name> or Part ... (state Deleting) already exists, but it will be deleted soon or similar error. It's fixed. Fixes #29616. #32201 (tavplubix).
Fix parsing of APPLY lambda column transformer which could lead to client/server crash. #32138 (Kruglov Pavel).