A fix for a feature that previously had unwanted behaviour. Do not allow direct select for Kafka/RabbitMQ/FileLog. Can be enabled by setting stream_like_engine_allow_direct_select. Direct select will be not allowed even if enabled by setting, in case there is an attached materialized view. For Kafka and RabbitMQ direct selectm if allowed, will not commit massages by default. To enable commits with direct select, user must use storage level setting kafka{rabbitmq}_commit_on_select=1 (default 0). #31053 (Kseniia Sumarokova).
Setting rename. Add custom null representation support for TSV/CSV input formats. Fix deserialing Nullable(String) in TSV/CSV/JSONCompactStringsEachRow/JSONStringsEachRow input formats. Rename output_format_csv_null_representation and output_format_tsv_null_representation to format_csv_null_representation and format_tsv_null_representation accordingly. #30497 (Kruglov Pavel).
Further deprecation of already unused code. This is relevant only for users of ClickHouse versions older than 20.6. A "leader election" mechanism is removed from ReplicatedMergeTree, because multiple leaders are supported since 20.6. If you are upgrading from an older version and some replica with an old version is a leader, then server will fail to start after upgrade. Stop replicas with old version to make new version start. After that it will not be possible to downgrade to version older than 20.6. #32140 (tavplubix).
Added CONSTRAINT ... ASSUME ... (without checking during INSERT). Added query transformation to CNF (https://github.com/ClickHouse/ClickHouse/issues/11749) for more convenient optimization. Added simple query rewriting using constraints (only simple matching now, will be improved to support <,=,>... later). Added ability to replace heavy columns with light columns if it's possible. #18787 (Nikita Vasilev).
TLDR: Major improvements of completeness and consistency of text formats. Refactor formats TSV, TSVRaw, CSV and JSONCompactEachRow, JSONCompactStringsEachRow, remove code duplication, add base interface for formats with -WithNames and -WithNamesAndTypes suffixes. Add formats CSVWithNamesAndTypes, TSVRawWithNames, TSVRawWithNamesAndTypes, JSONCompactEachRowWIthNames, JSONCompactStringsEachRowWIthNames, RowBinaryWithNames. Support parallel parsing for formats TSVWithNamesAndTypes, TSVRaw(WithNames/WIthNamesAndTypes), CSVWithNamesAndTypes, JSONCompactEachRow(WithNames/WIthNamesAndTypes), JSONCompactStringsEachRow(WithNames/WIthNamesAndTypes). Support columns mapping and types checking for RowBinaryWithNamesAndTypes format. Add setting input_format_with_types_use_header which specify if we should check that types written in <format_name>WIthNamesAndTypes format matches with table structure. Add setting input_format_csv_empty_as_default and use it in CSV format instead of input_format_defaults_for_omitted_fields (because this setting should not control csv_empty_as_default). Fix usage of setting input_format_defaults_for_omitted_fields (it was used only as csv_empty_as_default, but it should control calculation of default expressions for omitted fields). Fix Nullable input/output in TSVRaw format, make this format fully compatible with inserting into TSV. Fix inserting NULLs in LowCardinality(Nullable) when input_format_null_as_default is enabled (previously default values was inserted instead of actual NULLs). Fix strings deserialization in JSONStringsEachRow/JSONCompactStringsEachRow formats (strings were parsed just until first '\n' or '\t'). Add ability to use Raw escaping rule in Template input format. Add diagnostic info for JSONCompactEachRow(WithNames/WIthNamesAndTypes) input format. Fix bug with parallel parsing of -WithNames formats in case when setting min_chunk_bytes_for_parallel_parsing is less than bytes in a single row. #30178 (Kruglov Pavel). Allow to print/parse names and types of colums in CustomSeparated input/output format. Add formats CustomSeparatedWithNames/WithNamesAndTypes similar to TSVWithNames/WithNamesAndTypes. #31434 (Kruglov Pavel).
Exposes all settings of the global thread pool in the configuration file. #31285 (Tomáš Hromada).
Introduced window functions exponentialTimeDecayedSum, exponentialTimeDecayedMax, exponentialTimeDecayedCount and exponentialTimeDecayedAvg which are more effective than exponentialMovingAverage for bigger windows. Also more use-cases were covered. #29799 (Vladimir Chebotarev).
Allow to split GraphiteMergeTree rollup rules for plain/tagged metrics (optional rule_type field). #25122 (Michail Safronov).
Remove excessive DESC TABLE requests for remote() (in case of remote('127.1', system.one) (i.e. identifier as the db.table instead of string) there was excessive DESC TABLE request). #32019 (Azat Khuzhin).
Optimize function tupleElement to reading of subcolumn with enabled setting optimize_functions_to_subcolumns. #31261 (Anton Popov).
Optimize function mapContains to reading of subcolumn key with enabled settings optimize_functions_to_subcolumns. #31218 (Anton Popov).
Add settings merge_tree_min_rows_for_concurrent_read_for_remote_filesystem and merge_tree_min_bytes_for_concurrent_read_for_remote_filesystem. #30970 (Kseniia Sumarokova).
Improve skipping unknown fields with quoted escaping rule in Template/CustomSeparated formats. Previously you could skip only quoted strings, now you can skip values with any type. #32204 (Kruglov Pavel).
Now clickhouse-keeper refuses to start or apply configuration changes when they contain duplicated IDs or endpoints. Fixes #31339. #32121 (alesapin).
Allow to use predefined connections configuration for Kafka and RabbitMQ engines (the same way as for other integration table engines). #31691 (Kseniia Sumarokova).
Always re-render prompt while navigating history in clickhouse-client. This will improve usability of manipulating very long queries that don't fit on screen. #31675 (alexey-milovidov) (author: Amos Bird).
Add key bindings for navigating through history (instead of lines/history). #31641 (Azat Khuzhin).
Improve the max_execution_time checks. Fixed some cases when timeout checks do not happen and query could run too long. #31636 (Raúl Marín).
Use shard and replica name from Replicated database arguments when expanding macros in ReplicatedMergeTree arguments if these macros are not defined in config. Closes #31471. #31488 (tavplubix).
Better analysis for min/max/count projection. Now, with enabled allow_experimental_projection_optimization, virtual min/max/count projection can be used together with columns from partition key. #31474 (Amos Bird).
If some obsolete setting is changed - show warning in system.warnings. #31252 (tavplubix).
Improved backoff for background cleanup tasks in MergeTree. Settings merge_tree_clear_old_temporary_directories_interval_seconds and merge_tree_clear_old_parts_interval_seconds moved from users settings to merge tree settings. #31180 (tavplubix).
Now every replica will send to client only incremental information about profile events counters. #31155 (Dmitry Novik). This makes --hardware_utilization option in clickhouse-client usable.
Function name normalization for ALTER queries. This helps avoid metadata mismatch between creating table with indices/projections and adding indices/projections via alter commands. This is a follow-up PR of https://github.com/ClickHouse/ClickHouse/pull/20174. Mark as improvements as there are no bug reports and the senario is somehow rare. #31095 (Amos Bird).
Support IF EXISTS modifier for RENAME DATABASE/TABLE/DICTIONARY query. If this directive is used, one will not get an error if the DATABASE/TABLE/DICTIONARY to be renamed doesn't exist. #31081 (victorgao).
The local session inside a Clickhouse dictionary source won't send its events to the session log anymore. This fixes a possible deadlock (tsan alert) on shutdown. Also this PR fixes flaky test_dictionaries_dependency_xml/. #31013 (Vitaly Baranov).
Added \l, \d, \c commands in clickhouse-client like in MySQL and PostgreSQL. #30876 (Pavel Medvedev).
For clickhouse-local or clickhouse-client: if there is --interactive option with --query or --queries-file, then first execute them like in non-interactive and then start interactive mode. #30851 (Kseniia Sumarokova).
Fix possible "The local set of parts of X doesn't look like the set of parts in ZooKeeper" error (if DROP fails during removing znodes from zookeeper). #30826 (Azat Khuzhin).
Avro format works against Kafka. Setting output_format_avro_rows_in_file added. #30351 (Ilya Golshtein).
Fixed Directory ... already exists and is not empty error when detaching part. #32063 (tavplubix).
MaterializedMySQL (experimental feature): Fix misinterpretation of DECIMAL data from MySQL. #31990 (Håvard Kvålen).
FileLog (experimental feature) engine unnesessary created meta data directory when create table failed. Fix #31962. #31967 (flynn).
Some GET_PART entry might hang in replication queue if part is lost on all replicas and there are no other parts in the same partition. It's fixed in cases when partition key contains only columns of integer types or Date[Time]. Fixes #31485. #31887 (tavplubix).
Change configuration path from keeper_server.session_timeout_ms to keeper_server.coordination_settings.session_timeout_ms when constructing a KeeperTCPHandler. Same with operation_timeout. #31859 (JackyWoo).
Fix invalid cast of Nullable type when nullable primary key is used. (Nullable primary key is a discouraged feature - please do not use). This fixes #31075. #31823 (Amos Bird).
Use our own CMakeLists for zlib-ng, cassandra, mariadb-connector-c and xz, re2, sentry, gsasl, arrow, protobuf. This is needed for #20151. Part of #9226. A small step towards removal of annoying trash from the build system. #30599 (alexey-milovidov).
Hermetic builds: use fixed version of libc and make sure that no source or binary files from the host OS are using during build. This closes #27133. This closes #21435. This closes #30462. #30011 (alexey-milovidov).
This is relevant only if you already started using the experimental clickhouse-keeper support. Now ClickHouse Keeper snapshots compressed with ZSTD codec by default instead of custom ClickHouse LZ4 block compression. This behavior can be turned off with compress_snapshots_with_zstd_format coordination setting (must be equal on all quorum replicas). Backward incompatibility is quite rare and may happen only when new node will send snapshot (happens in case of recovery) to the old node which is unable to read snapshots in ZSTD format. #29417 (alesapin).
New asynchronous INSERT mode allows to accumulate inserted data and store it in a single batch in background. On client it can be enabled by setting async_insert for INSERT queries with data inlined in query or in separate buffer (e.g. for INSERT queries via HTTP protocol). If wait_for_async_insert is true (by default) the client will wait until data will be flushed to table. On server-side it controlled by the settings async_insert_threads, async_insert_max_data_size and async_insert_busy_timeout_ms. Implements #18282. #27537 (Anton Popov). #20557 (Ivan). Notes on performance: with asynchronous inserts you can do up to around 10 000 individual INSERT queries per second, so it is still recommended to insert in batches if you want to achieve performance up to millions inserted rows per second.
Add interactive mode for clickhouse-local. So, you can just run clickhouse-local to get a command line ClickHouse interface without connecting to a server and process data from files and external data sources. Also merge the code of clickhouse-client and clickhouse-local together. Closes #7203. Closes #25516. Closes #22401. #26231 (Kseniia Sumarokova).
Added support for executable (scriptable) user defined functions. These are UDFs that can be written in any programming language. #28803 (Maksim Kita).
Allow predefined connections to external data sources. This allows to avoid specifying credentials or addresses while using external data sources, they can be referenced by names instead. Closes #28367. #28577 (Kseniia Sumarokova).
Added INFORMATION_SCHEMA database with SCHEMATA, TABLES, VIEWS and COLUMNS views to the corresponding tables in system database. Closes #9770. #28691 (tavplubix).
Support multidimensional cosine distance and euclidean distance functions; L1, L2, Lp, Linf distances and norms. Scalar product on tuples and various arithmetic operators on tuples. This fully closes #4509 and even more. #27933 (Alexey Boykov).
Add support for compression and decompression for INTO OUTFILE and FROM INFILE (with autodetect or with additional optional parameter). #27135 (Filatenkov Artur).
Add CORS (Cross Origin Resource Sharing) support with HTTP OPTIONS request. It means, now Grafana will work with serverless requests without a kludges. Closes #18693. #29155 (Filatenkov Artur).
Added function tokens. That allow to split string into tokens using non-alpha numeric ASCII characters as separators. #29981 (Maksim Kita). Added function ngrams to extract ngrams from text. Closes #29699. #29738 (Maksim Kita).
Streaming consumption of application log files in ClickHouse with FileLog table engine. It's like Kafka or RabbitMQ engine but for append-only and rotated logs in local filesystem. Closes #6953. #25969 (flynn) (Kseniia Sumarokova).
Add a function getOSKernelVersion - it returns a string with OS kernel version. #29755 (Memo).
Added MD4 and SHA384 functions. MD4 is an obsolete and insecure hash function, it can be used only in rare cases when MD4 is already being used in some legacy system and you need to get exactly the same result. #29602 (Nikita Tikhomirov).
HSTS can be enabled for Clickhouse HTTP server by setting hsts_max_age in configuration file with a positive number. #29516 (凌涛).
New function mapContainsKeyLike to get the map that key matches a simple regular expression. #29471 (凌涛). New function mapExtractKeyLike to get the map only kept elements matched specified pattern. #30793 (凌涛).
Allow to include subcolumns of table columns into DESCRIBE query result (can be enabled by setting describe_include_subcolumns). #28905 (Anton Popov).
Executable, ExecutablePool added option send_chunk_header. If this option is true then chunk rows_count with line break will be sent to client before chunk. #28833 (Maksim Kita).
tokenbf_v1 and ngram support Map with key of String of FixedSring type. It enhance data skipping in query with map key filter. sql CREATE TABLE map_tokenbf ( row_id UInt32, map Map(String, String), INDEX map_tokenbf map TYPE ngrambf_v1(4,256,2,0) GRANULARITY 1 ) Engine=MergeTree() Order by id With table above, the query select * from map_tokebf where map['K']='V' will skip the granule that doesn't contain key A . Of course, how many rows will skipped is depended on the granularity and index_granularity you set. #28511 (凌涛).
Send profile events from server to client. New packet type ProfileEvents was introduced. Closes #26177. #28364 (Dmitry Novik).
Bit shift operations for FixedString and String data types. This closes #27763. #28325 (小路).
Support adding / deleting tables to replication from PostgreSQL dynamically in database engine MaterializedPostgreSQL. Support alter for database settings. Closes #27573. #28301 (Kseniia Sumarokova).
Background merges can be preempted by each other and they are scheduled with appropriate priorities. Now long running merges won't prevent short merges to proceed. This is needed for a better scheduling and controlling of merges execution. It reduces the chances to get "too many parts" error. #22381. #25165 (Nikita Mikhaylov). Added an ability to execute more merges and mutations than the number of threads in background pool. Merges and mutations will be executed step by step according to their sizes (lower is more prioritized). The ratio of the number of tasks to threads to execute is controlled by a setting background_merges_mutations_concurrency_ratio, 2 by default. #29140 (Nikita Mikhaylov).
Allow to use asynchronous reads for remote filesystems. Lower the number of seeks while reading from remote filesystems. It improves performance tremendously and makes the experimental web and s3 disks to work faster than EBS under certain conditions. #29205 (Kseniia Sumarokova). In the meantime, the web disk type (static dataset hosted on a web server) is graduated from being experimental to be production ready.
Queries with INTO OUTFILE in clickhouse-client will use multiple threads. Fix the issue with flickering progress-bar when using INTO OUTFILE. This closes #30873. This closes #30872. #30886 (alexey-milovidov).
Reduce amount of redundant compressed data read from disk for some types SELECT queries (only for MergeTree engines family). #30111 (alesapin).
Remove some redundant seek calls while reading compressed blocks in MergeTree table engines family. #29766 (alesapin).
Improve performance of aggregation in order of primary key (with enabled setting optimize_aggregation_in_order). #30266 (Anton Popov).
Now clickhouse is using DNS cache while communicating with external S3. #29999 (alesapin).
Add support for pushdown of IS NULL/IS NOT NULL to external databases (i.e. MySQL). #29463 (Azat Khuzhin). Transform isNull/isNotNull to IS NULL/IS NOT NULL (for external dbs, i.e. MySQL). #29446 (Azat Khuzhin).
SELECT queries from Dictionary tables will use multiple threads. #30500 (Maksim Kita).
Improve performance for filtering (WHERE operation) of Decimal columns. #30431 (Jun Jin).
Remove branchy code in filter operation with a better implementation with popcnt/ctz which have better performance. #29881 (Jun Jin).
Improve filter bytemask generator (used for WHERE operator) function all in one with SSE/AVX2/AVX512 instructions. Note that by default ClickHouse is only using SSE, so it's only relevant for custom builds. #30014 (jasperzhu). #30670 (jasperzhu).
Improve the performance of SUM aggregate function of Nullable floating point numbers. #28906 (Raúl Marín).
Primary key index and partition filter can work in tuple. #29281 (凌涛).
If query has multiple quantile aggregate functions with the same arguments but different level parameter, they will be fused together and executed in one pass if the setting optimize_syntax_fuse_functions is enabled. #26657 (hexiaoting).
Now min-max aggregation over the first expression of primary key is optimized by projection. This is for #329. #29918 (Amos Bird).
Multiple improvements for SQL UDF. Queries for manipulation of SQL User Defined Functions now support ON CLUSTER clause. Example CREATE FUNCTION test_function ON CLUSTER 'cluster' AS x -> x + 1;. Closes #30666. #30734 (Maksim Kita). Support CREATE OR REPLACE, CREATE IF NOT EXISTS syntaxes. #30454 (Maksim Kita). Added DROP IF EXISTS support. Example DROP FUNCTION IF EXISTS test_function. #30437 (Maksim Kita). Support lambdas. Example CREATE FUNCTION lambda_function AS x -> arrayMap(element -> element * 2, x);. #30435 (Maksim Kita). Support SQL user defined functions for clickhouse-local. #30179 (Maksim Kita).
User can now create dictionaries with comments: CREATE DICTIONARY ... COMMENT 'vaue' ... #29899 (Vasily Nemkov). Users now can set comments to database in CREATE DATABASE statement ... #29429 (Vasily Nemkov).
Introduce compiled_expression_cache_elements_size setting. If you will ever want to use this setting, you will already know what it does. #30667 (Maksim Kita).
clickhouse-format now supports option --query. In previous versions you have to pass the query to stdin. #29325 (凌涛).
Support ALTER TABLE for tables in Memory databases. Memory databases are used in clickhouse-local. #30866 (tavplubix).
Increase listen_backlog by default (to match default in newer linux kernel). #29643 (Azat Khuzhin).
Reload dictionaries, models, user defined executable functions if servers config dictionaries_config, models_config, user_defined_executable_functions_config changes. Closes #28142. #29529 (Maksim Kita).
Get rid of pointless restriction on projection name. Now projection name can start with tmp_. #29520 (Amos Bird).
Fixed There is no query or query context has expired error in mutations with nested subqueries. Do not allow subqueries in mutation if table is replicated and allow_nondeterministic_mutations setting is disabled. #29495 (tavplubix).
Apply config changes to max_concurrent_queries during runtime (no need to restart). #29414 (Raúl Marín).
Add support for FREEZEing in-memory parts (for backups). #29376 (Mo Xuan).
Pass through initial query_id for clickhouse-benchmark (previously if you run remote query via clickhouse-benchmark, queries on shards will not be linked to the initial query via initial_query_id). #29364 (Azat Khuzhin).
Skip indexes tokenbf_v1 and ngrambf_v1: added support for Array data type with key of String of FixedString type. #29280 (Maksim Kita). Skip indexes tokenbf_v1 and ngrambf_v1 added support for Map data type with key of String of FixedString type. Author @lingtaolf. #29220 (Maksim Kita).
clickhouse-keeper: Fix bug in clickhouse-keeper-converter which can lead to some data loss while restoring from ZooKeeper logs (not snapshot). #29030 (小路). Fix bug in clickhouse-keeper-converter which can lead to incorrect ZooKeeper log deserialization. #29071 (小路).
ClickHouse can be statically built with Musl. This is added as experiment, it does not support building odbc-bridge, library-bridge, integration with CatBoost and some libraries. #30248 (alexey-milovidov).
Functions for case-insensitive search in UTF-8 strings like positionCaseInsensitiveUTF8 and countSubstringsCaseInsensitiveUTF8 might find substrings that actually does not match in very rare cases, it's fixed. #30663 (tavplubix).
Fix transformation of disjunctions chain to IN (controlled by settings optimize_min_equality_disjunction_chain_length) in distributed queries with settings legacy_column_name_of_tuple_literal = 0. #28658 (Anton Popov).
Allow using a materialized column as the sharding key in a distributed table even if insert_allow_materialized_columns=0:. #28637 (Vitaly Baranov).
Fix ORDER BY ... WITH FILL with set TO and FROM and no rows in result set. #30888 (Anton Popov).
Fix set index not used in AND/OR expressions when there are more than two operands. This fixes #30416 . #30887 (Amos Bird).
Fixed ambiguity when extracting auxiliary ZooKeeper name from ZooKeeper path in ReplicatedMergeTree. Previously server might fail to start with Unknown auxiliary ZooKeeper name if ZooKeeper path contains a colon. Fixes #29052. Also it was allowed to specify ZooKeeper path that does not start with slash, but now it's deprecated and creation of new tables with such path is not allowed. Slashes and colons in auxiliary ZooKeeper names are not allowed too. #30822 (tavplubix).
Clean temporary directory when localBackup failed by some reason. #30797 (ianton-ru).
Fixed a race condition between REPLACE/MOVE PARTITION and background merge in non-replicated MergeTree that might cause a part of moved/replaced data to remain in partition. Fixes #29327. #30717 (tavplubix).
Fix reading from MergeTree with max_read_buffer_size = 0 (when the user wants to shoot himself in the foot) (can lead to exceptions Can't adjust last granule, LOGICAL_ERROR, or even data loss). #30192 (Azat Khuzhin).
Fix pread_fake_async/pread_threadpool with min_bytes_to_use_direct_io. #30191 (Azat Khuzhin).
Fix INSERT SELECT incorrectly fills MATERIALIZED column based of Nullable column. #30189 (Azat Khuzhin).
Fix error Port is already connected for queries with GLOBAL IN and WITH TOTALS. Only for 21.9 and 21.10. #30086 (Nikolai Kochetov).
Fix race between MOVE PARTITION and merges/mutations for MergeTree. #30074 (Azat Khuzhin).
Dropped Memory database might reappear after server restart, it's fixed (#29795). Also added force_remove_data_recursively_on_drop setting as a workaround for Directory not empty error when dropping Ordinary database (because it's not possible to remove data leftovers manually in cloud environment). #30054 (tavplubix).
Fix system tables recreation check (fails to detect changes in enum values). #29857 (Azat Khuzhin).
MaterializedMySQL: Fix an issue where if the connection to MySQL was lost, only parts of a transaction could be processed. #29837 (Håvard Kvålen).
Avoid Timeout exceeded: elapsed 18446744073.709553 seconds error that might happen in extremely rare cases, presumably due to some bug in kernel. Fixes #29154. #29811 (tavplubix).
Fix bad cast in ATTACH TABLE ... FROM 'path' query when non-string literal is used instead of path. It may lead to reading of uninitialized memory. #29790 (alexey-milovidov).
Fix concurrent access to LowCardinality during GROUP BY (in combination with Buffer tables it may lead to troubles). #29782 (Azat Khuzhin).
Fix incorrect GROUP BY (multiple rows with the same keys in result) in case of distributed query when shards had mixed versions <= 21.3 and >= 21.4, GROUP BY key had several columns all with fixed size, and two-level aggregation was activated (see group_by_two_level_threshold and group_by_two_level_threshold_bytes). Fixes #29580. #29735 (Nikolai Kochetov).
Fix possible Table columns structure in ZooKeeper is different from local table structure exception while recreating or creating new replicas of ReplicatedMergeTree, when one of table columns have default expressions with case-insensitive functions. #29266 (Anton Popov).
Send normal Database doesn't exist error (UNKNOWN_DATABASE) to the client (via TCP) instead of Attempt to read after eof (ATTEMPT_TO_READ_AFTER_EOF). #29229 (Azat Khuzhin).
Fix segfault while inserting into column with type LowCardinality(Nullable) in Avro input format. #29132 (Kruglov Pavel).
Do not allow to reuse previous credentials in case of inter-server secret (Before INSERT via Buffer/Kafka to Distributed table with interserver secret configured for that cluster, may re-use previously set user for that connection). #29060 (Azat Khuzhin).
Now the following MergeTree table-level settings: replicated_max_parallel_sends, replicated_max_parallel_sends_for_table, replicated_max_parallel_fetches, replicated_max_parallel_fetches_for_table do nothing. They never worked well and were replaced with max_replicated_fetches_network_bandwidth, max_replicated_sends_network_bandwidth and background_fetches_pool_size. #28404 (alesapin).
Add feature for creating user-defined functions (UDF) as lambda expressions. Syntax CREATE FUNCTION {function_name} as ({parameters}) -> {function core}. Example CREATE FUNCTION plus_one as (a) -> a + 1. Authors @Realist007. #27796 (Maksim Kita) #23978 (Realist007).
Added Executable storage engine and executable table function. It enables data processing with external scripts in streaming fashion. #28102 (Maksim Kita) (ruct).
Added ExecutablePool storage engine. Similar to Executable but it's using a pool of long running processes. #28518 (Maksim Kita).
web type of disks to store readonly tables on web server in form of static files. See #23982. #25251 (Kseniia Sumarokova). This is mostly needed to faciliate testing of operation on shared storage and for easy importing of datasets. Not recommended to use before release 21.11.
Added new commands BACKUP and RESTORE. #21945 (Vitaly Baranov). This is under development and not intended to be used in current version.
Create virtual projection for minmax indices. Now, when allow_experimental_projection_optimization is enabled, queries will use minmax index instead of reading the data when possible. #26286 (Amos Bird).
Introducing two checks in sequenceMatch and sequenceCount that allow for early exit when some deterministic part of the sequence pattern is missing from the events list. This change unlocks many queries that would previously fail due to reaching operations cap, and generally speeds up the pipeline. #27729 (Jakub Kuklis).
Enhance primary key analysis with always monotonic information of binary functions, notably non-zero constant division. #28302 (Amos Bird).
Add interactive documentation in clickhouse-client about how to reset the password. This is useful in scenario when user has installed ClickHouse, set up the password and instantly forget it. See #27750. #27903 (alexey-milovidov).
Use real tmp file instead of predefined "rows_sources" for vertical merges. This avoids generating garbage directories in tmp disks. #28299 (Amos Bird).
Added libhdfs3_conf in server config instead of export env LIBHDFS3_CONF in clickhouse-server.service. This is for configuration of interaction with HDFS. #28268 (Zhichang Yu).
Fix removing of parts in a Temporary state which can lead to an unexpected exception (Part %name% doesn't exist). Fixes #23661. #28221#28221) (Azat Khuzhin).
Fix zookeeper_log.address (before the first patch in this PR the address was always ::) and reduce number of calls getpeername(2) for this column (since each time entry for zookeeper_log is added getpeername() is called, cache this address in the zookeeper client to avoid this). #28212 (Azat Khuzhin).
Support implicit conversions between index in operator [] and key of type Map (e.g. different Int types, String and FixedString). #28096 (Anton Popov).
Add a setting empty_result_for_aggregation_by_constant_keys_on_empty_set to control the behavior of grouping by constant keys on empty set. This is to bring back the old baviour of #6842. #27932 (Amos Bird).
Added replication_wait_for_inactive_replica_timeout setting. It allows to specify how long to wait for inactive replicas to execute ALTER/OPTIMZE/TRUNCATE query (default is 120 seconds). If replication_alter_partitions_sync is 2 and some replicas are not active for more than replication_wait_for_inactive_replica_timeout seconds, then UNFINISHED will be thrown. #27931 (tavplubix).
Support lambda argument for APPLY column transformer which allows applying functions with more than one argument. This is for #27877. #27901 (Amos Bird).
Fix UUID overlap in DROP TABLE for internal DDL from MaterializedMySQL. MaterializedMySQL is an experimental feature. #28533 (Azat Khuzhin).
Fix There is no subcolumn error, while select from tables, which have Nested columns and scalar columns with dot in name and the same prefix as Nested (e.g. n.id UInt32, n.arr1 Array(UInt64), n.arr2 Array(UInt64)). #28531 (Anton Popov).
Fix bug which can lead to error Existing table metadata in ZooKeeper differs in sorting key expression. after ALTER of ReplicatedVersionedCollapsingMergeTree. Fixes #28515. #28528 (alesapin).
Fixed possible ZooKeeper watches leak (minor issue) on background processing of distributed DDL queue. Closes #26036. #28446 (tavplubix).
Multiple fixes for the new clickhouse-keeper tool. Fix a rare bug in clickhouse-keeper when the client can receive a watch response before request-response. #28197 (alesapin). Fix incorrect behavior in clickhouse-keeper when list watches (getChildren) triggered with set requests for children. #28190 (alesapin). Fix rare case when changes of clickhouse-keeper settings may lead to lost logs and server hung. #28360 (alesapin). Fix bug in clickhouse-keeper which can lead to endless logs when rotate_logs_interval decreased. #28152 (alesapin).
Enable Thread Fuzzer in Stress Test. Thread Fuzzer is ClickHouse feature that allows to test more permutations of thread scheduling and discover more potential issues. This closes #9813. This closes #9814. This closes #9515. This closes #9516. #27538 (alexey-milovidov).
Add new log level test for testing environments. It is even more verbose than the default trace. #28559 (alesapin).
Do not output trailing zeros in text representation of Decimal types. Example: 1.23 will be printed instead of 1.230000 for decimal with scale 6. This closes #15794. It may introduce slight incompatibility if your applications somehow relied on the trailing zeros. Serialization in output formats can be controlled with the setting output_format_decimal_trailing_zeros. Implementation of toString and casting to String is changed unconditionally. #27680 (alexey-milovidov).
Do not allow to apply parametric aggregate function with -Merge combinator to aggregate function state if state was produced by aggregate function with different parameters. For example, state of fooState(42)(x) cannot be finalized with fooMerge(s) or fooMerge(123)(s), parameters must be specified explicitly like fooMerge(42)(s) and must be equal. It does not affect some special aggregate functions like quantile and sequence* that use parameters for finalization only. #26847 (tavplubix).
Under clickhouse-local, always treat local addresses with a port as remote. #26736 (Raúl Marín).
Fix the issue that in case of some sophisticated query with column aliases identical to the names of expressions, bad cast may happen. This fixes #25447. This fixes #26914. This fix may introduce backward incompatibility: if there are different expressions with identical names, exception will be thrown. It may break some rare cases when enable_optimize_predicate_expression is set. #26639 (alexey-milovidov).
Now, scalar subquery always returns Nullable result if it's type can be Nullable. It is needed because in case of empty subquery it's result should be Null. Previously, it was possible to get error about incompatible types (type deduction does not execute scalar subquery, and it could use not-nullable type). Scalar subquery with empty result which can't be converted to Nullable (like Array or Tuple) now throws error. Fixes #25411. #26423 (Nikolai Kochetov).
Introduce syntax for here documents. Example SELECT $doc$ VALUE $doc$. #26671 (Maksim Kita). This change is backward incompatible if in query there are identifiers that contain $#28768.
Now indices can handle Nullable types, including isNull and isNotNull. #12433 and #12455 (Amos Bird) and #27250 (Azat Khuzhin). But this was done with on-disk format changes, and even though new server can read old data, old server cannot. Also, in case you have MINMAX data skipping indices, you may get Data after mutation/merge is not byte-identical error, since new index will have .idx2 extension while before it was .idx. That said, that you should not delay updating all existing replicas, in this case, otherwise, if old replica (<21.9) will download data from new replica with 21.9+ it will not be able to apply index for downloaded part.
When client connects to server, it receives information about all warnings that are already were collected by server. (It can be disabled by using option --no-warnings). Add system.warnings table to collect warnings about server configuration. #26246 (Filatenkov Artur). #26282 (Filatenkov Artur).
Allow using constant expressions from with and select in aggregate function parameters. Close #10945. #27531 (abel-cheng).
Improve the performance of fast queries when max_execution_time = 0 by reducing the number of clock_gettime system calls. #27325 (filimonov).
Specialize date time related comparison to achieve better performance. This fixes #27083 . #27122 (Amos Bird).
Share file descriptors in concurrent reads of the same files. There is no noticeable performance difference on Linux. But the number of opened files will be significantly (10..100 times) lower on typical servers and it makes operations easier. See #26214. #26768 (alexey-milovidov).
Improve latency of short queries, that require reading from tables with large number of columns. #26371 (Anton Popov).
Mark window functions as ready for general use. Remove the allow_experimental_window_functions setting. #27184 (Alexander Kuzmenkov).
Improve compatibility with non-whole-minute timezone offsets. #27080 (Raúl Marín).
If file descriptor in File table is regular file - allow to read multiple times from it. It allows clickhouse-local to read multiple times from stdin (with multiple SELECT queries or subqueries) if stdin is a regular file like clickhouse-local --query "SELECT * FROM table UNION ALL SELECT * FROM table" ... < file. This closes #11124. Co-authored with (alexey-milovidov). #25960 (BoloniniD).
Remove duplicate index analysis and avoid possible invalid limit checks during projection analysis. #27742 (Amos Bird).
Add _CAST function for internal usage, which will not preserve type nullability, but non-internal cast will preserve according to setting cast_keep_nullable. Closes #12636. #27382 (Kseniia Sumarokova).
Add setting log_formatted_queries to log additional formatted query into system.query_log. It's useful for normalized query analysis because functions like normalizeQuery and normalizeQueryKeepNames don't parse/format queries in order to achieve better performance. #27380 (Amos Bird).
Add two settings max_hyperscan_regexp_length and max_hyperscan_regexp_total_length to prevent huge regexp being used in hyperscan related functions, such as multiMatchAny. #27378 (Amos Bird).
Now functions can be shard-level constants, which means if it's executed in the context of some distributed table, it generates a normal column, otherwise it produces a constant value. Notable functions are: hostName(), tcpPort(), version(), buildId(), uptime(), etc. #27020 (Amos Bird).
Updated extractAllGroupsHorizontal - upper limit on the number of matches per row can be set via optional third argument. #26961 (Vasily Nemkov).
Expose RocksDB statistics via system.rocksdb table. Read rocksdb options from ClickHouse config (rocksdb... keys). NOTE: ClickHouse does not rely on RocksDB, it is just one of the additional integration storage engines. #26821 (Azat Khuzhin).
Less verbose internal RocksDB logs. NOTE: ClickHouse does not rely on RocksDB, it is just one of the additional integration storage engines. This closes #26252. #26789 (alexey-milovidov).
Add round-robin support for clickhouse-benchmark (it does not differ from the regular multi host/port run except for statistics report). #26607 (Azat Khuzhin).
Executable dictionaries (executable, executable_pool) enable creation with DDL query using clickhouse-local. Closes #22355. #26510 (Maksim Kita).
Set client query kind for mysql and postgresql compatibility protocol handlers. #26498 (anneji-dev).
Apply LIMIT on the shards for queries like SELECT * FROM dist ORDER BY key LIMIT 10 w/ distributed_push_down_limit=1. Avoid running Distinct/LIMIT BY steps for queries like SELECT DISTINCT shading_key FROM dist ORDER BY key. Now distributed_push_down_limit is respected by optimize_distributed_group_by_sharding_key optimization. #26466 (Azat Khuzhin).
Allow to reuse connections of shards among different clusters. It also avoids creating new connections when using cluster table function. #26318 (Amos Bird).
Control the execution period of clear old temporary directories by parameter with default value. #26212. #26313 (fastio).
Add a setting function_range_max_elements_in_block to tune the safety threshold for data volume generated by function range. This closes #26303. #26305 (alexey-milovidov).
Check hash function at table creation, not at sampling. Add settings for MergeTree, if someone create a table with incorrect sampling column but sampling never be used, disable this settings for starting the server without exception. #26256 (zhaoyu).
Added output_format_avro_string_column_pattern setting to put specified String columns to Avro as string instead of default bytes. Implements #22414. #26245 (Ilya Golshtein).
Don't throw exception when querying system.detached_parts table if there is custom disk configuration and detached directory does not exist on some disks. This closes #26078. #26236 (alexey-milovidov).
Fix incorrect output with --progress option for clickhouse-local. Progress bar will be cleared once it gets to 100% - same as it is done for clickhouse-client. Closes #17484. #26128 (Kseniia Sumarokova).
Remove complicated usage of Linux AIO with one block readahead and replace it with plain simple synchronous IO with O_DIRECT. In previous versions, the setting min_bytes_to_use_direct_io may not work correctly if max_threads is greater than one. Reading with direct IO (that is disabled by default for queries and enabled by default for large merges) will work in less efficient way. This closes #25997. #26003 (alexey-milovidov).
Flush Distributed table on REPLACE TABLE query. Resolves #24566 - Do not replace (or create) table on [CREATE OR] REPLACE TABLE ... AS SELECT query if insertion into new table fails. Resolves #23175. #25895 (tavplubix).
Add views column to system.query_log containing the names of the (materialized or live) views executed by the query. Adds a new log table (system.query_views_log) that contains information about each view executed during a query. Modifies view execution: When an exception is thrown while executing a view, any view that has already startedwill continue running until it finishes. This used to be the behaviour under parallel_view_processing=true and now it's always the same behaviour. - Dependent views now report reading progress to the context. #25714 (Raúl Marín).
Do connection draining asynchonously upon finishing executing distributed queries. A new server setting is added max_threads_for_connection_collector which specifies the number of workers to recycle connections in background. If the pool is full, connection will be drained synchronously but a bit different than before: It's drained after we send EOS to client, query will succeed immediately after receiving enough data, and any exception will be logged instead of throwing to the client. Added setting drain_timeout (3 seconds by default). Connection draining will disconnect upon timeout. #25674 (Amos Bird).
Support for multiple includes in configuration. It is possible to include users configuration, remote servers configuration from multiple sources. Simply place <include /> element with from_zk, from_env or incl attribute and it will be replaced with the substitution. #24404 (nvartolomei).
Fix multiple block insertion into distributed table with insert_distributed_one_random_shard = 1. This is a marginal feature. Mark as improvement. #23140 (Amos Bird).
Support LowCardinality and FixedString keys/values for Map type. #21543 (hexiaoting).
Fix bad type cast when functions like arrayHas are applied to arrays of LowCardinality of Nullable of different non-numeric types like DateTime and DateTime64. In previous versions bad cast occurs. In new version it will lead to exception. This closes #26330. #27682 (alexey-milovidov).
Fix errors like Expected ColumnLowCardinality, gotUInt8 or Bad cast from type DB::ColumnVector<char8_t> to DB::ColumnLowCardinality for some queries with LowCardinality in PREWHERE. And more importantly, fix the lack of whitespace in the error message. Fixes #23515. #27298 (Nikolai Kochetov).
Fix distributed_group_by_no_merge = 2 with distributed_push_down_limit = 1 or optimize_distributed_group_by_sharding_key = 1 with LIMIT BY and LIMIT OFFSET. #27249 (Azat Khuzhin). These are obscure combination of settings that no one is using.
Fix mutation stuck on invalid partitions in non-replicated MergeTree. #27248 (Azat Khuzhin).
In case of ambiguity, lambda functions prefer its arguments to other aliases or identifiers. #27235 (Raúl Marín).
Fixed cache, complex_key_cache, ssd_cache, complex_key_ssd_cache configuration parsing. Options allow_read_expired_keys, max_update_queue_size, update_queue_push_timeout_milliseconds, query_wait_timeout_milliseconds were not parsed for dictionaries with non cache type. #27032 (Maksim Kita).
Aggregate function parameters might be lost when applying some combinators causing exceptions like Conversion from AggregateFunction(topKArray, Array(String)) to AggregateFunction(topKArray(10), Array(String)) is not supported. It's fixed. Fixes #26196 and #26433. #26814 (tavplubix).
Add event_time_microseconds value for REMOVE_PART in system.part_log. In previous versions is was not set. #26720 (Azat Khuzhin).
Do not remove data on ReplicatedMergeTree table shutdown to avoid creating data to metadata inconsistency. #26716 (nvartolomei).
Fix optimize_distributed_group_by_sharding_key for multiple columns (leads to incorrect result w/ optimize_skip_unused_shards=1/allow_nondeterministic_optimize_skip_unused_shards=1 and multiple columns in sharding key expression). #26353 (Azat Khuzhin).
Fixed rare bug in lost replica recovery that may cause replicas to diverge. #26321 (tavplubix).
Fix zstd decompression (for import/export in zstd framing format that is unrelated to tables data) in case there are escape sequences at the end of internal buffer. Closes #26013. #26314 (Kseniia Sumarokova).
Don't throw exception in toString for Nullable Enum if Enum does not have a value for zero, close #25806. #26123 (Vladimir C).
Fixed incorrect sequence_id in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes #21184. #26051 (tavplubix).
Fix for the case that cutToFirstSignificantSubdomainCustom()/cutToFirstSignificantSubdomainCustomWithWWW()/firstSignificantSubdomainCustom() returns incorrect type for consts, and hence optimize_skip_unused_shards does not work:. #26041 (Azat Khuzhin).
Fix possible mismatched header when using normal projection with prewhere. This fixes #26020. #26038 (Amos Bird).
Fix sharding_key from column w/o function for remote() (before select * from remote('127.1', system.one, dummy) leads to Unknown column: dummy, there are only columns . error). #25824 (Azat Khuzhin).
Fix optimize_skip_unused_shards_rewrite_in for non-UInt64 types (may select incorrect shards eventually or throw Cannot infer type of an empty tuple or Function tuple requires at least one argument). #25798 (Azat Khuzhin).
Now we ran stateful and stateless tests in random timezones. Fixes #12439. Reading String as DateTime and writing DateTime as String in Protobuf format now respect timezone. Reading UInt16 as DateTime in Arrow and Parquet formats now treat it as Date and then converts to DateTime with respect to DateTime's timezone, because Date is serialized in Arrow and Parquet as UInt16. GraphiteMergeTree now respect time zone for rounding of times. Fixes #5098. Author: @alexey-milovidov. #15408 (alesapin).
New version is using Map data type for system logs tables (system.query_log, system.query_thread_log, system.processes, system.opentelemetry_span_log). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes #18698. #23934, #25773 (hexiaoting, sundy-li, Maksim Kita). If you want to downgrade from version 21.8 to older versions, you will need to cleanup system tables with logs manually. Look at /var/lib/clickhouse/data/system/*_log.
Collect common system metrics (in system.asynchronous_metrics and system.asynchronous_metric_log) on CPU usage, disk usage, memory usage, IO, network, files, load average, CPU frequencies, thermal sensors, EDAC counters, system uptime; also added metrics about the scheduling jitter and the time spent collecting the metrics. It works similar to atop in ClickHouse and allows access to monitoring data even if you have no additional tools installed. Close #9430. #24416 (alexey-milovidov, Yegor Levankov).
Add MaterializedPostgreSQL table engine and database engine. This database engine allows replicating a whole database or any subset of database tables. #20470 (Kseniia Sumarokova).
Add an ability to reset a custom setting to default and remove it from the table's metadata. It allows rolling back the change without knowing the system/config's default. Closes #14449. #17769 (xjewer).
Render pipelines as graphs in Web UI if EXPLAIN PIPELINE graph = 1 query is submitted. #26067 (alexey-milovidov).
Use Map data type for system logs tables (system.query_log, system.query_thread_log, system.processes, system.opentelemetry_span_log). These tables will be auto-created with new data types. Virtual columns are created to support old queries. Closes #18698. #23934, #25773 (hexiaoting, sundy-li, Maksim Kita).
For a dictionary with a complex key containing only one attribute, allow not wrapping the key expression in tuple for functions dictGet, dictHas. #26130 (Maksim Kita).
Implement function bin/hex from AggregateFunction states. #26094 (zhaoyu).
Support arguments of UUID type for empty and notEmpty functions. UUID is empty if it is all zeros (nil UUID). Closes #3446. #25974 (zhaoyu).
More instrumentation for network interaction: add counters for recv/send bytes; add gauges for recvs/sends. Added missing documentation. Close #5897. #25962 (alexey-milovidov).
Add setting optimize_move_to_prewhere_if_final. If query has FINAL, the optimization move_to_prewhere will be enabled only if both optimize_move_to_prewhere and optimize_move_to_prewhere_if_final are enabled. Closes #8684. #25940 (Kseniia Sumarokova).
Allow parameters for parametric aggregate functions to be arbitrary constant expressions (e.g., 1 + 2), not just literals. It also allows using the query parameters (in parameterized queries like {param:UInt8}) inside parametric aggregate functions. Closes #11607. #25910 (alexey-milovidov).
Support for multiple includes in configuration. It is possible to include users configuration, remote server configuration from multiple sources. Simply place <include /> element with from_zk, from_env or incl attribute, and it will be replaced with the substitution. #24404 (nvartolomei).
Support for queries with a column named "null" (it must be specified in back-ticks or double quotes) and ON CLUSTER. Closes #24035. #25907 (alexey-milovidov).
Fix optimize_distributed_group_by_sharding_key for multiple columns (leads to incorrect result w/ optimize_skip_unused_shards=1/allow_nondeterministic_optimize_skip_unused_shards=1 and multiple columns in sharding key expression). #26353 (Azat Khuzhin).
CAST from Date to DateTime (or DateTime64) was not using the timezone of the DateTime type. It can also affect the comparison between Date and DateTime. Inference of the common type for Date and DateTime also was not using the corresponding timezone. It affected the results of function if and array construction. Closes #24128. #24129 (Maksim Kita).
Fixed rare bug in lost replica recovery that may cause replicas to diverge. #26321 (tavplubix).
Fixed incorrect sequence_id in MySQL protocol packets that ClickHouse sends on exception during query execution. It might cause MySQL client to reset connection to ClickHouse server. Fixes #21184. #26051 (tavplubix).
Fix possible mismatched header when using normal projection with PREWHERE. Fix #26020. #26038 (Amos Bird).
Fix optimize_skip_unused_shards_rewrite_in for non-UInt64 types (may select incorrect shards eventually or throw Cannot infer type of an empty tuple or Function tuple requires at least one argument). #25798 (Azat Khuzhin).
Fix rare bug with DROP PART query for ReplicatedMergeTree tables which can lead to error message Unexpected merged part intersecting drop range. #25783 (alesapin).
Fix bug in TTL with GROUP BY expression which refuses to execute TTL after first execution in part. #25743 (alesapin).
Improved performance of queries with explicitly defined large sets. Added compatibility setting legacy_column_name_of_tuple_literal. It makes sense to set it to true, while doing rolling update of cluster from version lower than 21.7 to any higher version. Otherwise distributed queries with explicitly defined sets at IN clause may fail during update. #25371 (Anton Popov).
Forward/backward incompatible change of maximum buffer size in clickhouse-keeper (an experimental alternative to ZooKeeper). Better to do it now (before production), than later. #25421 (alesapin).
Added function dateName to return names like 'Friday' or 'April'. Author [Daniil Kondratyev] (@dankondr). #25372 (Maksim Kita).
Add toJSONString function to serialize columns to their JSON representations. #25164 (Amos Bird).
Now query_log has two new columns: initial_query_start_time, initial_query_start_time_microsecond that record the starting time of a distributed query if any. #25022 (Amos Bird).
Add aggregate function segmentLengthSum. #24250 (flynn).
Add a new boolean setting prefer_global_in_and_join which defaults all IN/JOIN as GLOBAL IN/JOIN. #23434 (Amos Bird).
Support ALTER DELETE queries for Join table engine. #23260 (foolchi).
Add quantileBFloat16 aggregate function as well as the corresponding quantilesBFloat16 and medianBFloat16. It is very simple and fast quantile estimator with relative error not more than 0.390625%. This closes #16641. #23204 (Ivan Novitskiy).
Implement sequenceNextNode() function useful for flow analysis. #19766 (achimbab).
Added optimization that transforms some functions to reading of subcolumns to reduce amount of read data. E.g., statement col IS NULL is transformed to reading of subcolumn col.null. Optimization can be enabled by setting optimize_functions_to_subcolumns which is currently off by default. #24406 (Anton Popov).
Rewrite more columns to possible alias expressions. This may enable better optimization, such as projections. #24405 (Amos Bird).
Index of type bloom_filter can be used for expressions with hasAny function with constant arrays. This closes: #24291. #24900 (Vasily Nemkov).
Add exponential backoff to reschedule read attempt in case RabbitMQ queues are empty. (ClickHouse has support for importing data from RabbitMQ). Closes #24340. #24415 (Kseniia Sumarokova).
Allow to limit bandwidth for replication. Add two Replicated*MergeTree settings: max_replicated_fetches_network_bandwidth and max_replicated_sends_network_bandwidth which allows to limit maximum speed of replicated fetches/sends for table. Add two server-wide settings (in default user profile): max_replicated_fetches_network_bandwidth_for_server and max_replicated_sends_network_bandwidth_for_server which limit maximum speed of replication for all tables. The settings are not followed perfectly accurately. Turned off by default. Fixes #1821. #24573 (alesapin).
Resource constraints and isolation for ODBC and Library bridges. Use separate clickhouse-bridge group and user for bridge processes. Set oom_score_adj so the bridges will be first subjects for OOM killer. Set set maximum RSS to 1 GiB. Closes #23861. #25280 (Kseniia Sumarokova).
Add standalone clickhouse-keeper symlink to the main clickhouse binary. Now it's possible to run coordination without the main clickhouse server. #24059 (alesapin).
Use global settings for query to VIEW. Fixed the behavior when queries to VIEW use local settings, that leads to errors if setting on CREATE VIEW and SELECT were different. As for now, VIEW won't use these modified settings, but you can still pass additional settings in SETTINGS section of CREATE VIEW query. Close #20551. #24095 (Vladimir).
Improvement for Distributed tables. Drop replicas from dirname for internal_replication=true (allows INSERT into Distributed with cluster from any number of replicas, before only 15 replicas was supported, everything more will fail with ENAMETOOLONG while creating directory for async blocks). #25513 (Azat Khuzhin).
Added support Interval type for LowCardinality. It is needed for intermediate values of some expressions. Closes #21730. #25410 (Vladimir).
Add == operator on time conditions for sequenceMatch and sequenceCount functions. For eg: sequenceMatch('(?1)(?t==1)(?2)')(time, data = 1, data = 2). #25299 (Christophe Kalenzaga).
Fix topLevelDomain for IDN hosts (i.e. example.рф), before it returns empty string for such hosts. #25103 (Azat Khuzhin).
Detect Linux kernel version at runtime (for worked nested epoll, that is required for async_socket_for_remote/use_hedged_requests, otherwise remote queries may stuck). #25067 (Azat Khuzhin).
For distributed query, when optimize_skip_unused_shards=1, allow to skip shard with condition like (sharding key) IN (one-element-tuple). (Tuples with many elements were supported. Tuple with single element did not work because it is parsed as literal). #24930 (Amos Bird).
Improved log messages of S3 errors, no more double whitespaces in case of empty keys and buckets. #24897 (Vladimir Chebotarev).
Some queries require multi-pass semantic analysis. Try reusing built sets for IN in this case. #24874 (Amos Bird).
Respect max_distributed_connections for insert_distributed_sync (otherwise for huge clusters and sync insert it may run out of max_thread_pool_size). #24754 (Azat Khuzhin).
Avoid hiding errors like Limit for rows or bytes to read exceeded for scalar subqueries. #24545 (nvartolomei).
Make String-to-Int parser stricter so that toInt64('+') will throw. #24475 (Amos Bird).
If SSD_CACHE is created with DDL query, it can be created only inside user_files directory. #24466 (Maksim Kita).
Fix trailing whitespaces in FROM clause with subqueries in multiline mode, and also changes the output of the queries slightly in a more human friendly way. #24151 (Azat Khuzhin).
Improvement for Distributed tables. Add ability to split distributed batch on failures (i.e. due to memory limits, corruptions), under distributed_directory_monitor_split_batch_on_failure (OFF by default). #23864 (Azat Khuzhin).
Display progress for File table engine in clickhouse-local and on INSERT query in clickhouse-client when data is passed to stdin. Closes #18209. #23656 (Kseniia Sumarokova).
Bugfixes and improvements of clickhouse-copier. Allow to copy tables with different (but compatible schemas). Closes #9159. Added test to copy ReplacingMergeTree. Closes #22711. Support TTL on columns and Data Skipping Indices. It simply removes it to create internal Distributed table (underlying table will have TTL and skipping indices). Closes #19384. Allow to copy MATERIALIZED and ALIAS columns. There are some cases in which it could be helpful (e.g. if this column is in PRIMARY KEY). Now it could be allowed by setting allow_to_copy_alias_and_materialized_columns property to true in task configuration. Closes #9177. Closes [#11007] (https://github.com/ClickHouse/ClickHouse/issues/11007). Closes #9514. Added a property allow_to_drop_target_partitions in task configuration to drop partition in original table before moving helping tables. Closes #20957. Get rid of OPTIMIZE DEDUPLICATE query. This hack was needed, because ALTER TABLE MOVE PARTITION was retried many times and plain MergeTree tables don't have deduplication. Closes #17966. Write progress to ZooKeeper node on path task_path + /status in JSON format. Closes #20955. Support for ReplicatedTables without arguments. Closes #24834 .#23518 (Nikita Mikhaylov).
Resolve the actual port number bound when a user requests any available port from the operating system to show it in the log message. #25569 (bnaecker).
Fixed case, when sometimes conversion of postgres arrays resulted in String data type, not n-dimensional array, because attndims works incorrectly in some cases. Closes #24804. #25538 (Kseniia Sumarokova).
Fix extremely rare bug on low-memory servers which can lead to the inability to perform merges without restart. Possibly fixes #24603. #24872 (alesapin).
Fix extremely rare error Tagging already tagged part in replication queue during concurrent alter move/replace partition. Possibly fixes #22142. #24961 (alesapin).
Fix potential crash when calculating aggregate function states by aggregation of aggregate function states of other aggregate functions (not a practical use case). See #24523. #25015 (alexey-milovidov).
Fixed the behavior when query SYSTEM RESTART REPLICA or SYSTEM SYNC REPLICA does not finish. This was detected on server with extremely low amount of RAM. #24457 (Nikita Mikhaylov).
Fix bug which can lead to ZooKeeper client hung inside clickhouse-server. #24721 (alesapin).
If ZooKeeper connection was lost and replica was cloned after restoring the connection, its replication queue might contain outdated entries. Fixed failed assertion when replication queue contains intersecting virtual parts. It may rarely happen if some data part was lost. Print error in log instead of terminating. #24777 (tavplubix).
Fix lost WHERE condition in expression-push-down optimization of query plan (setting query_plan_filter_push_down = 1 by default). Fixes #25368. #25370 (Nikolai Kochetov).
Fix bug which can lead to intersecting parts after merges with TTL: Part all_40_40_0 is covered by all_40_40_1 but should be merged into all_40_41_1. This shouldn't happen often.. #25549 (alesapin).
On ZooKeeper connection loss ReplicatedMergeTree table might wait for background operations to complete before trying to reconnect. It's fixed, now background operations are stopped forcefully. #25306 (tavplubix).
Fix error Key expression contains comparison between inconvertible types for queries with ARRAY JOIN in case if array is used in primary key. Fixes #8247. #25546 (Anton Popov).
Fix logical error with exception message "Cannot sum Array/Tuple in min/maxMap". #25298 (Kruglov Pavel).
Fix error Bad cast from type DB::ColumnLowCardinality to DB::ColumnVector<char8_t> for queries where LowCardinality argument was used for IN (this bug appeared in 21.6). Fixes #25187. #25290 (Nikolai Kochetov).
Fix incorrect behaviour of joinGetOrNull with not-nullable columns. This fixes #24261. #25288 (Amos Bird).
Fix incorrect behaviour and UBSan report in big integers. In previous versions CAST(1e19 AS UInt128) returned zero. #25279 (alexey-milovidov).
Fixed possible error 'Cannot read from istream at offset 0' when reading a file from DiskS3 (S3 virtual filesystem is an experimental feature under development that should not be used in production). #24885 (Pavel Kovalenko).
Fix computation of total bytes in Buffer table. In current ClickHouse version total_writes.bytes counter decreases too much during the buffer flush. It leads to counter overflow and totalBytes return something around 17.44 EB some time after the flush. #24450 (DimasKovas).
When user authentication is managed by LDAP. Fixed potential deadlock that can happen during LDAP role (re)mapping, when LDAP group is mapped to a nonexistent local role. #24431 (Denis Glazachev).
In "multipart/form-data" message consider the CRLF preceding a boundary as part of it. Fixes #23905. #24399 (Ivan).
Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. #24321 (Amos Bird).
Fixed a bug in moving Materialized View from Ordinary to Atomic database (RENAME TABLE query). Now inner table is moved to new database together with Materialized View. Fixes #23926. #24309 (tavplubix).
A fix for Kafka tables. Fix the bug in failover behavior when Engine = Kafka was not able to start consumption if the same consumer had an empty assignment previously. Closes #21118. #21267 (filimonov).
Ubuntu 20.04 is now used to run integration tests, docker-compose version used to run integration tests is updated to 1.28.2. Environment variables now take effect on docker-compose. Rework test_dictionaries_all_layouts_separate_sources to allow parallel run. #20393 (Ilya Yatsishin).
zstd compression library is updated to v1.5.0. You may get messages about "checksum does not match" in replication. These messages are expected due to update of compression algorithm and you can ignore them. These messages are informational and do not indicate any kinds of undesired behaviour.
The setting compile_expressions is enabled by default. Although it has been heavily tested on variety of scenarios, if you find some undesired behaviour on your servers, you can try turning this setting off.
Values of UUID type cannot be compared with integer. For example, instead of writing uuid != 0 type uuid != '00000000-0000-0000-0000-000000000000'.
Make big integers production ready. Add support for UInt128 data type. Fix known issues with the Decimal256 data type. Support big integers in dictionaries. Support gcd/lcm functions for big integers. Support big integers in array search and conditional functions. Support LowCardinality(UUID). Support big integers in generateRandom table function and clickhouse-obfuscator. Fix error with returning UUID from scalar subqueries. This fixes #7834. This fixes #23936. This fixes #4176. This fixes #24018. Backward incompatible change: values of UUID type cannot be compared with integer. For example, instead of writing uuid != 0 type uuid != '00000000-0000-0000-0000-000000000000'. #23631 (alexey-milovidov).
Support Array data type for inserting and selecting data in Arrow, Parquet and ORC formats. #21770 (taylor12805).
Support creating dictionaries with DDL queries in clickhouse-local. Closes #22354. Added support for DETACH DICTIONARY PERMANENTLY. Added support for EXCHANGE DICTIONARIES for Atomic database engine. Added support for moving dictionaries between databases using RENAME DICTIONARY. #23436 (Maksim Kita).
Add setting json (boolean, 0 by default) for EXPLAIN PLAN query. When enabled, query output will be a single JSON row. It is recommended to use TSVRaw format to avoid unnecessary escaping. #23082 (Nikolai Kochetov).
Add setting indexes (boolean, disabled by default) to EXPLAIN PIPELINE query. When enabled, shows used indexes, number of filtered parts and granules for every index applied. Supported for MergeTree* tables. #22352 (Nikolai Kochetov).
LDAP: implemented user DN detection functionality to use when mapping Active Directory groups to ClickHouse roles. #22228 (Denis Glazachev).
New aggregate function deltaSumTimestamp for summing the difference between consecutive rows while maintaining ordering during merge by storing timestamps. #21888 (Russ Frank).
Enable compile_expressions setting by default. When this setting enabled, compositions of simple functions and operators will be compiled to native code with LLVM at runtime. #8482 (Maksim Kita, alexey-milovidov). Note: if you feel in trouble, turn this option off.
Update re2 library. Performance of regular expressions matching is improved. Also this PR adds compatibility with gcc-11. #24196 (Raúl Marín).
ORC input format reading by stripe instead of reading entire table into memory by once which is cost memory when file size is huge. #23102 (Chao Ma).
Fusion of aggregate functions sum, count and avg in a query into single aggregate function. The optimization is controlled with the optimize_fuse_sum_count_avg setting. This is implemented with a new aggregate function sumCount. This function returns a tuple of two fields: sum and count. #21337 (hexiaoting).
Update zstd to v1.5.0. The performance of compression is improved for single digits percentage. #24135 (Raúl Marín). Note: you may get messages about "checksum does not match" in replication. These messages are expected due to update of compression algorithm and you can ignore them.
Improved performance of Buffer tables: do not acquire lock for total_bytes/total_rows for Buffer engine. #24066 (Azat Khuzhin).
Preallocate support for hashed/sparse_hashed dictionaries is returned. #23979 (Azat Khuzhin).
Enable async_socket_for_remote by default (lower amount of threads in querying Distributed tables with large fanout). #23683 (Nikolai Kochetov).
Allow configuring different log levels for different logging channels. Closes #19569. #23857 (filimonov).
Keep default timezone on DateTime operations if it was not provided explicitly. For example, if you add one second to a value of DateTime type without timezone it will remain DateTime without timezone. In previous versions the value of default timezone was placed to the returned data type explicitly so it becomes DateTime('something'). This closes #4854. #23392 (alexey-milovidov).
Allow user to specify empty string instead of database name for MySQL storage. Default database will be used for queries. In previous versions it was working for SELECT queries and not support for INSERT was also added. This closes #19281. This can be useful working with Sphinx or other MySQL-compatible foreign databases. #23319 (alexey-milovidov).
Fixed quantile(s)TDigest. Added special handling of singleton centroids according to tdunning/t-digest 3.2+. Also a bug with over-compression of centroids in implementation of earlier version of the algorithm was fixed. #23314 (Vladimir Chebotarev).
Fix the case when a progress bar in interactive mode in clickhouse-client that appear in the middle of the data may rewrite some parts of visible data in terminal. This closes #19283. #23050 (alexey-milovidov).
Preserve dictionaries until storage shutdown (this will avoid possible external dictionary 'DICT' not found errors at server shutdown during final flush of the Buffer engine). #24068 (Azat Khuzhin).
Flush Buffer tables before shutting down tables (within one database), to avoid discarding blocks due to underlying table had been already detached (and Destination table default.a_data_01870 doesn't exist. Block of data is discarded error in the log). #24067 (Azat Khuzhin).
Now prefer_column_name_to_alias = 1 will also favor column names for group by, having and order by. This fixes #23882. #24022 (Amos Bird).
Add hints for names of Enum elements (suggest names in case of typos). Closes #17112. #23919 (flynn).
Measure found rate (the percentage for which the value was found) for dictionaries (see found_rate in system.dictionaries). #23916 (Azat Khuzhin).
Allow to add specific queue settings via table settng rabbitmq_queue_settings_list. (Closes #23737 and #23918). Allow user to control all RabbitMQ setup: if table setting rabbitmq_queue_consume is set to 1 - RabbitMQ table engine will only connect to specified queue and will not perform any RabbitMQ consumer-side setup like declaring exchange, queues, bindings. (Closes #21757). Add proper cleanup when RabbitMQ table is dropped - delete queues, which the table has declared and all bound exchanges - if they were created by the table. #23887 (Kseniia Sumarokova).
Add broken_data_files/broken_data_compressed_bytes into system.distribution_queue. Add metric for number of files for asynchronous insertion into Distributed tables that has been marked as broken (BrokenDistributedFilesToInsert). #23885 (Azat Khuzhin).
Querying system.tables does not go to ZooKeeper anymore. #23793 (Fuwang Hu).
Respect lock_acquire_timeout_for_background_operations for OPTIMIZE queries. #23623 (Azat Khuzhin).
Possibility to change S3 disk settings in runtime via new SYSTEM RESTART DISK SQL command. #23429 (Pavel Kovalenko).
If user applied a misconfiguration by mistakenly setting max_distributed_connections to value zero, every query to a Distributed table will throw exception with a message containing "logical error". But it's really an expected behaviour, not a logical error, so the exception message was slightly incorrect. It also triggered checks in our CI enviroment that ensures that no logical errors ever happen. Instead we will treat max_distributed_connections misconfigured to zero as the minimum possible value (one). #23348 (Azat Khuzhin).
Use old modulo function version when used in partition key and primary key. Closes #23508. #24157 (Kseniia Sumarokova). It was a source of backward incompatibility in previous releases.
Fixed the behavior when query SYSTEM RESTART REPLICA or SYSTEM SYNC REPLICA is being processed infinitely. This was detected on server with extremely little amount of RAM. #24457 (Nikita Mikhaylov).
Fix incorrect monotonicity of toWeek function. This fixes #24422 . This bug was introduced in #5212, and was exposed later by smarter partition pruner. #24446 (Amos Bird).
Fix drop partition with intersect fake parts. In rare cases there might be parts with mutation version greater than current block number. #24321 (Amos Bird).
Fixed a bug in moving Materialized View from Ordinary to Atomic database (RENAME TABLE query). Now inner table is moved to new database together with Materialized View. Fixes #23926. #24309 (tavplubix).
Some ALTER PARTITION queries might cause Part A intersects previous part B and Unexpected merged part C intersecting drop range D errors in replication queue. It's fixed. Fixes #23296. #23997 (tavplubix).
Fix SIGSEGV for external GROUP BY and overflow row (i.e. queries like SELECT FROM GROUP BY WITH TOTALS SETTINGS max_bytes_before_external_group_by>0, max_rows_to_group_by>0, group_by_overflow_mode='any', totals_mode='before_having'). #23962 (Azat Khuzhin).
Fix keys metrics accounting for CACHE dictionary with duplicates in the source (leads to DictCacheKeysRequestedMiss overflows). #23929 (Azat Khuzhin).
Fix distributed_group_by_no_merge = 2 with GROUP BY and aggregate function wrapped into regular function (had been broken in #23546). Throw exception in case of someone trying to use distributed_group_by_no_merge = 2 with window functions. Disable optimize_distributed_group_by_sharding_key for queries with window functions. #23906 (Azat Khuzhin).
A fix for s3 table function: better handling of HTTP errors. Response bodies of HTTP errors were being ignored earlier. #23844 (Vladimir Chebotarev).
A fix for s3 table function: better handling of URI's. Fixed an incompatibility with URLs containing + symbol, data with such keys could not be read previously. #23822 (Vladimir Chebotarev).
Fix error Can't initialize pipeline with empty pipe for queries with GLOBAL IN/JOIN and use_hedged_requests. Fixes #23431. #23805 (Nikolai Kochetov).
Fix CLEAR COLUMN does not work when it is referenced by materialized view. Close #23764. #23781 (flynn).
Change comparison of integers and floating point numbers when integer is not exactly representable in the floating point data type. In new version comparison will return false as the rounding error will occur. Example: 9223372036854775808.0 != 9223372036854775808, because the number 9223372036854775808 is not representable as floating point number exactly (and 9223372036854775808.0 is rounded to 9223372036854776000.0). But in previous version the comparison will return as the numbers are equal, because if the floating point number 9223372036854776000.0 get converted back to UInt64, it will yield 9223372036854775808. For the reference, the Python programming language also treats these numbers as equal. But this behaviour was dependend on CPU model (different results on AMD64 and AArch64 for some out-of-range numbers), so we make the comparison more precise. It will treat int and float numbers equal only if int is represented in floating point type exactly. #22595 (alexey-milovidov).
Remove support for argMin and argMax for single Tuple argument. The code was not memory-safe. The feature was added by mistake and it is confusing for people. These functions can be reintroduced under different names later. This fixes #22384 and reverts #17359. #23393 (alexey-milovidov).
Added functions dictGetChildren(dictionary, key), dictGetDescendants(dictionary, key, level). Function dictGetChildren return all children as an array if indexes. It is a inverse transformation for dictGetHierarchy. Function dictGetDescendants return all descendants as if dictGetChildren was applied level times recursively. Zero level value is equivalent to infinity. Improved performance of dictGetHierarchy, dictIsIn functions. Closes #14656. #22096 (Maksim Kita).
Added function dictGetOrNull. It works like dictGet, but return Null in case key was not found in dictionary. Closes #22375. #22413 (Maksim Kita).
Added a table function s3Cluster, which allows to process files from s3 in parallel on every node of a specified cluster. #22012 (Nikita Mikhaylov).
Added support for replicas and shards in MySQL/PostgreSQL table engine / table function. You can write SELECT * FROM mysql('host{1,2}-{1|2}', ...). Closes #20969. #22217 (Kseniia Sumarokova).
Added ALTER TABLE ... FETCH PART ... query. It's similar to FETCH PARTITION, but fetches only one part. #22706 (turbo jason).
Added a setting max_distributed_depth that limits the depth of recursive queries to Distributed tables. Closes #20229. #21942 (flynn).
Improved performance of reading from ArrowStream input format for sources other then local file (e.g. URL). #22673 (nvartolomei).
Disabled compression by default when interacting with localhost (with clickhouse-client or server to server with distributed queries) via native protocol. It may improve performance of some import/export operations. This closes #22234. #22237 (alexey-milovidov).
Exclude values that does not belong to the shard from right part of IN section for distributed queries (under optimize_skip_unused_shards_rewrite_in, enabled by default, since it still requires optimize_skip_unused_shards). #21511 (Azat Khuzhin).
Improved performance of reading a subset of columns with File-like table engine and column-oriented format like Parquet, Arrow or ORC. This closes #issue:20129. #21302 (keenwolf).
Allow to move more conditions to PREWHERE as it was before version 21.1 (adjustment of internal heuristics). Insufficient number of moved condtions could lead to worse performance. #23397 (Anton Popov).
Improved performance of ODBC connections and fixed all the outstanding issues from the backlog. Using nanodbc library instead of Poco::ODBC. Closes #9678. Add support for DateTime64 and Decimal* for ODBC table engine. Closes #21961. Fixed issue with cyrillic text being truncated. Closes #16246. Added connection pools for odbc bridge. #21972 (Kseniia Sumarokova).
Added Decimal256 type support in dictionaries. Decimal256 is experimental feature. Closes #20979. #22960 (Maksim Kita).
Enabled async_socket_for_remote by default (using less amount of OS threads for distributed queries). #23683 (Nikolai Kochetov).
Fixed quantile(s)TDigest. Added special handling of singleton centroids according to tdunning/t-digest 3.2+. Also a bug with over-compression of centroids in implementation of earlier version of the algorithm was fixed. #23314 (Vladimir Chebotarev).
Implement functions arrayHasAny, arrayHasAll, has, indexOf, countEqual for generic case when types of array elements are different. In previous versions the functions arrayHasAny, arrayHasAll returned false and has, indexOf, countEqual thrown exception. Also add support for Decimal and big integer types in functions has and similar. This closes #20272. #23044 (alexey-milovidov).
Raised the threshold on max number of matches in result of the function extractAllGroupsHorizontal. #23036 (Vasily Nemkov).
Do not perform optimize_skip_unused_shards for cluster with one node. #22999 (Azat Khuzhin).
Added ability to run clickhouse-keeper (experimental drop-in replacement to ZooKeeper) with SSL. Config settings keeper_server.tcp_port_secure can be used for secure interaction between client and keeper-server. keeper_server.raft_configuration.secure can be used to enable internal secure communication between nodes. #22992 (alesapin).
Added ability to flush buffer only in background for Buffer tables. #22986 (Azat Khuzhin).
When selecting from MergeTree table with NULL in WHERE condition, in rare cases, exception was thrown. This closes #20019. #22978 (alexey-milovidov).
Multiple fixes for hedged requests. Fixed an error Can't initialize pipeline with empty pipe for queries with GLOBAL IN/JOIN when the setting use_hedged_requests is enabled. Fixes #23431. #23805 (Nikolai Kochetov). Fixed a race condition in hedged connections which leads to crash. This fixes #22161. #22443 (Kruglov Pavel). Fix possible crash in case if unknown packet was received from remote query (with async_socket_for_remote enabled). Fixes #21167. #23309 (Nikolai Kochetov).
Fixed the behavior when disabling input_format_with_names_use_header setting discards all the input with CSVWithNames format. This fixes #22406. #23202 (Nikita Mikhaylov).
Fixed a crash when modifying column's default value when a column itself is used as ReplacingMergeTree's parameter. #23483 (hexiaoting).
Fixed corner cases in vertical merges with ReplacingMergeTree. In rare cases they could lead to fails of merges with exceptions like Incomplete granules are not allowed while blocks are granules size. #23459 (Anton Popov).
Fixed bug that does not allow cast from empty array literal, to array with dimensions greater than 1, e.g. CAST([] AS Array(Array(String))). Closes #14476. #23456 (Maksim Kita).
Fixed a bug when deltaSum aggregate function produced incorrect result after resetting the counter. #23437 (Russ Frank).
Fixed Cannot unlink file error on unsuccessful creation of ReplicatedMergeTree table with multidisk configuration. This closes #21755. #23433 (tavplubix).
Fixed very rare race condition on background cleanup of old blocks. It might cause a block not to be deduplicated if it's too close to the end of deduplication window. #23301 (tavplubix).
Fixed very rare (distributed) race condition between creation and removal of ReplicatedMergeTree tables. It might cause exceptions like node doesn't exist on attempt to create replicated table. Fixes #21419. #23294 (tavplubix).
Fixed simple key dictionary from DDL creation if primary key is not first attribute. Fixes #23236. #23262 (Maksim Kita).
Map data type (experimental feature): fixed an incorrect formatting of function map in distributed queries. #22588 (foolchi).
Fixed deserialization of empty string without newline at end of TSV format. This closes #20244. Possible workaround without version update: set input_format_null_as_default to zero. It was zero in old versions. #22527 (alexey-milovidov).
Buffer overflow (on read) was possible in tokenbf_v1 full text index. The excessive bytes are not used but the read operation may lead to crash in rare cases. This closes #19233. #22421 (alexey-milovidov).
Fixed a bug, which leads to underaggregation of data in case of enabled optimize_aggregation_in_order and many parts in table. Slightly improve performance of aggregation with enabled optimize_aggregation_in_order. #21889 (Anton Popov).
Check if table function view is used as a column. This complements #20350. #21465 (Amos Bird).
Fix "unknown column" error for tables with Merge engine in queris with JOIN and aggregation. Closes #18368, close #22226. #21370 (Vladimir).
Fixed name clashes in pushdown optimization. It caused incorrect WHERE filtration after FULL JOIN. Close #20497. #20622 (Vladimir).
Fixed very rare bug when quorum insert with quorum_parallel=1 is not really "quorum" because of deduplication. #18215 (filimonov - reported, alesapin - fixed).
The toStartOfIntervalFunction will align hour intervals to the midnight (in previous versions they were aligned to the start of unix epoch). For example, toStartOfInterval(x, INTERVAL 11 HOUR) will split every day into three intervals: 00:00:00..10:59:59, 11:00:00..21:59:59 and 22:00:00..23:59:59. This behaviour is more suited for practical needs. This closes #9510. #22060 (alexey-milovidov).
Age and Precision in graphite rollup configs should increase from retention to retention. Now it's checked and the wrong config raises an exception. #21496 (Mikhail f. Shiryaev).
Fix cutToFirstSignificantSubdomainCustom()/firstSignificantSubdomainCustom() returning wrong result for 3+ level domains present in custom top-level domain list. For input domains matching these custom top-level domains, the third-level domain was considered to be the first significant one. This is now fixed. This change may introduce incompatibility if the function is used in e.g. the sharding key. #21946 (Azat Khuzhin).
Column keys in table system.dictionaries was replaced to columns key.names and key.types. Columns key.names, key.types, attribute.names, attribute.types from system.dictionaries table does not require dictionary to be loaded. #21884 (Maksim Kita).
Now replicas that are processing the ALTER TABLE ATTACH PART[ITION] command search in their detached/ folders before fetching the data from other replicas. As an implementation detail, a new command ATTACH_PART is introduced in the replicated log. Parts are searched and compared by their checksums. #18978 (Mike Kot). Note:
ATTACH PART[ITION] queries may not work during cluster upgrade.
It's not possible to rollback to older ClickHouse version after executing ALTER ... ATTACH query in new version as the old servers would fail to pass the ATTACH_PART entry in the replicated log.
In this version, empty <remote_url_allow_hosts></remote_url_allow_hosts> will block all access to remote hosts while in previous versions it did nothing. If you want to keep old behaviour and you have empty remote_url_allow_hosts element in configuration file, remove it. #20058 (Vladimir Chebotarev).
Extended range of DateTime64 to support dates from year 1925 to 2283. Improved support of DateTime around zero date (1970-01-01). #9404 (alexey-milovidov, Vasily Nemkov). Not every time and date functions are working for extended range of dates.
Added support of Kerberos authentication for preconfigured users and HTTP requests (GSS-SPNEGO). #14995 (Denis Glazachev).
Add prefer_column_name_to_alias setting to use original column names instead of aliases. it is needed to be more compatible with common databases' aliasing rules. This is for #9715 and #9887. #22044 (Amos Bird).
Added functions dictGetChildren(dictionary, key), dictGetDescendants(dictionary, key, level). Function dictGetChildren return all children as an array if indexes. It is a inverse transformation for dictGetHierarchy. Function dictGetDescendants return all descendants as if dictGetChildren was applied level times recursively. Zero level value is equivalent to infinity. Closes #14656. #22096 (Maksim Kita).
Functions dictGet, dictHas use current database name if it is not specified for dictionaries created with DDL. Closes #21632. #21859 (Maksim Kita).
Added function dictGetOrNull. It works like dictGet, but return Null in case key was not found in dictionary. Closes #22375. #22413 (Maksim Kita).
Added async update in ComplexKeyCache, SSDCache, SSDComplexKeyCache dictionaries. Added support for Nullable type in Cache, ComplexKeyCache, SSDCache, SSDComplexKeyCache dictionaries. Added support for multiple attributes fetch with dictGet, dictGetOrDefault functions. Fixes #21517. #20595 (Maksim Kita).
Add function timezoneOf that returns the timezone name of DateTime or DateTime64 data types. This does not close #9959. Fix inconsistencies in function names: add aliases timezone and timeZone as well as toTimezone and toTimeZone and timezoneOf and timeZoneOf. #22001 (alexey-milovidov).
Add new optional clause GRANTEES for CREATE/ALTER USER commands. It specifies users or roles which are allowed to receive grants from this user on condition this user has also all required access granted with grant option. By default GRANTEES ANY is used which means a user with grant option can grant to anyone. Syntax: CREATE USER ... GRANTEES {user | role | ANY | NONE} [,...] [EXCEPT {user | role} [,...]]. #21641 (Vitaly Baranov).
Add new column slowdowns_count to system.clusters. When using hedged requests, it shows how many times we switched to another replica because this replica was responding slowly. Also show actual value of errors_count in system.clusters. #21480 (Kruglov Pavel).
Add _partition_id virtual column for MergeTree* engines. Allow to prune partitions by _partition_id. Add partitionID() function to calculate partition id string. #21401 (Amos Bird).
Add function isIPAddressInRange to test if an IPv4 or IPv6 address is contained in a given CIDR network prefix. #21329 (PHO).
Added new SQL command ALTER TABLE 'table_name' UNFREEZE [PARTITION 'part_expr'] WITH NAME 'backup_name'. This command is needed to properly remove 'freezed' partitions from all disks. #21142 (Pavel Kovalenko).
Support RANGE OFFSET frame (for window functions) for floating point types. Implement lagInFrame/leadInFrame window functions, which are analogous to lag/lead, but respect the window frame. They are identical when the frame is between unbounded preceding and unbounded following. This closes #5485. #21895 (Alexander Kuzmenkov).
Zero-copy replication for ReplicatedMergeTree over S3 storage. #16240 (ianton-ru).
Added possibility to migrate existing S3 disk to the schema with backup-restore capabilities. #22070 (Pavel Kovalenko).
Enable read with mmap IO for file ranges from 64 MiB (the settings min_bytes_to_use_mmap_io). It may lead to moderate performance improvement. #22326 (alexey-milovidov).
Add cache for files read with min_bytes_to_use_mmap_io setting. It makes significant (2x and more) performance improvement when the value of the setting is small by avoiding frequent mmap/munmap calls and the consequent page faults. Note that mmap IO has major drawbacks that makes it less reliable in production (e.g. hung or SIGBUS on faulty disks; less controllable memory usage). Nevertheless it is good in benchmarks. #22206 (alexey-milovidov).
Avoid unnecessary data copy when using codec NONE. Please note that codec NONE is mostly useless - it's recommended to always use compression (LZ4 is by default). Despite the common belief, disabling compression may not improve performance (the opposite effect is possible). The NONE codec is useful in some cases: - when data is uncompressable; - for synthetic benchmarks. #22145 (alexey-milovidov).
Introduce a new merge tree setting min_bytes_to_rebalance_partition_over_jbod which allows assigning new parts to different disks of a JBOD volume in a balanced way. #16481 (Amos Bird).
Added Grant, Revoke and System values of query_kind column for corresponding queries in system.query_log. #21102 (Vasily Nemkov).
Allow customizing timeouts for HTTP connections used for replication independently from other HTTP timeouts. #20088 (nvartolomei).
Better exception message in client in case of exception while server is writing blocks. In previous versions client may get misleading message like Data compressed with different methods. #22427 (alexey-milovidov).
Fix error Directory tmp_fetch_XXX already exists which could happen after failed fetch part. Delete temporary fetch directory if it already exists. Fixes #14197. #22411 (nvartolomei).
Fix MSan report for function range with UInt256 argument (support for large integers is experimental). This closes #22157. #22387 (alexey-milovidov).
Add current_database column to system.processes table. It contains the current database of the query. #22365 (Alexander Kuzmenkov).
Add case-insensitive history search/navigation and subword movement features to clickhouse-client. #22105 (Amos Bird).
If tuple of NULLs, e.g. (NULL, NULL) is on the left hand side of IN operator with tuples of non-NULLs on the right hand side, e.g. SELECT (NULL, NULL) IN ((0, 0), (3, 1)) return 0 instead of throwing an exception about incompatible types. The expression may also appear due to optimization of something like SELECT (NULL, NULL) = (8, 0) OR (NULL, NULL) = (3, 2) OR (NULL, NULL) = (0, 0) OR (NULL, NULL) = (3, 1). This closes #22017. #22063 (alexey-milovidov).
Add option strict_increase to windowFunnel function to calculate each event once (resolve #21835). #22025 (Vladimir).
If partition key of a MergeTree table does not include Date or DateTime columns but includes exactly one DateTime64 column, expose its values in the min_time and max_time columns in system.parts and system.parts_columns tables. Add min_time and max_time columns to system.parts_columns table (these was inconsistency to the system.parts table). This closes #18244. #22011 (alexey-milovidov).
Supported replication_alter_partitions_sync=1 setting in clickhouse-copier for moving partitions from helping table to destination. Decreased default timeouts. Fixes #21911. #21912 (turbo jason).
Show path to data directory of EmbeddedRocksDB tables in system tables. #21903 (tavplubix).
Add profile event HedgedRequestsChangeReplica, change read data timeout from sec to ms. #21886 (Kruglov Pavel).
DiskS3 (experimental feature under development). Fixed bug with the impossibility to move directory if the destination is not empty and cache disk is used. #21837 (Pavel Kovalenko).
Propagate query and session settings for distributed DDL queries. Set distributed_ddl_entry_format_version to 2 to enable this. Added distributed_ddl_output_mode setting. Supported modes: none, throw (default), null_status_on_timeout and never_throw. Miscellaneous fixes and improvements for Replicated database engine. #21535 (tavplubix).
If PODArray was instantiated with element size that is neither a fraction or a multiple of 16, buffer overflow was possible. No bugs in current releases exist. #21533 (alexey-milovidov).
Add last_error_time/last_error_message/last_error_stacktrace/remote columns for system.errors. #21529 (Azat Khuzhin).
Add setting optimize_skip_unused_shards_limit to limit the number of sharding key values for optimize_skip_unused_shards. #21512 (Azat Khuzhin).
Improve clickhouse-format to not throw exception when there are extra spaces or comment after the last query, and throw exception early with readable message when format ASTInsertQuery with data . #21311 (flynn).
Remove socket from epoll before cancelling packet receiver in HedgedConnections to prevent possible race. Fixes #22161. #22443 (Kruglov Pavel).
Add (missing) memory accounting in parallel parsing routines. In previous versions OOM was possible when the resultset contains very large blocks of data. This closes #22008. #22425 (alexey-milovidov).
Fix exception which may happen when SELECT has constant WHERE condition and source table has columns which names are digits. #22270 (LiuNeng).
Fix query cancellation with use_hedged_requests=0 and async_socket_for_remote=1. #22183 (Azat Khuzhin).
The function decrypt was lacking a check for the minimal size of data encrypted in AEAD mode. This closes #21897. #22064 (alexey-milovidov).
In rare case, merge for CollapsingMergeTree may create granule with index_granularity + 1 rows. Because of this, internal check, added in #18928 (affects 21.2 and 21.3), may fail with error Incomplete granules are not allowed while blocks are granules size. This error did not allow parts to merge. #21976 (Nikolai Kochetov).
Reverted #15454 that may cause significant increase in memory usage while loading external dictionaries of hashed type. This closes #21935. #21948 (Maksim Kita).
Prevent hedged connections overlaps (Unknown packet 9 from server error). #21941 (Azat Khuzhin).
Fix reading the HTTP POST request with "multipart/form-data" content type in some cases. #21936 (Ivan).
Fix wrong ORDER BY results when a query contains window functions, and optimization for reading in primary key order is applied. Fixes #21828. #21915 (Alexander Kuzmenkov).
Fix bug for ReplicatedMerge table engines when ALTER MODIFY COLUMN query doesn't change the type of Decimal column if its size (32 bit or 64 bit) doesn't change. #21728 (alesapin).
Fix possible infinite waiting when concurrent OPTIMIZE and DROP are run for ReplicatedMergeTree. #21716 (Azat Khuzhin).
Fix function arrayElement with type Map for constant integer arguments. #21699 (Anton Popov).
Fix SIGSEGV on not existing attributes from ip_trie with access_to_key_from_attributes. #21692 (Azat Khuzhin).
Server now start accepting connections only after DDLWorker and dictionaries initialization. #21676 (Azat Khuzhin).
Add type conversion for keys of tables of type Join (previously led to SIGSEGV). #21646 (Azat Khuzhin).
Fix distributed requests cancellation (for example simple select from multiple shards with limit, i.e. select * from remote('127.{2,3}', system.numbers) limit 100) with async_socket_for_remote=1. #21643 (Azat Khuzhin).
Multiple preparations for PowerPC builds: Enable the bundled openldap on ppc64le. #22487 (Kfir Itzhak). Enable compiling on ppc64le with Clang. #22476 (Kfir Itzhak). Fix compiling boost on ppc64le. #22474 (Kfir Itzhak). Fix CMake error about internal CMake variable CMAKE_ASM_COMPILE_OBJECT not set on ppc64le. #22469 (Kfir Itzhak). Fix Fedora/RHEL/CentOS not finding libclang_rt.builtins on ppc64le. #22458 (Kfir Itzhak). Enable building with jemalloc on ppc64le. #22447 (Kfir Itzhak). Fix ClickHouse's config embedding and cctz's timezone embedding on ppc64le. #22445 (Kfir Itzhak). Fixed compiling on ppc64le and use the correct instruction pointer register on ppc64le. #22430 (Kfir Itzhak).
Add llvm-12 binaries name to search in cmake scripts. Implicit constants conversions to mute clang warnings. Updated submodules to build with CMake 3.19. Mute recursion in macro expansion in readpassphrase library. Deprecated -fuse-ld changed to --ld-path for clang. #21597 (Ilya Yatsishin).
Updating docker/test/testflows/runner/dockerd-entrypoint.sh to use Yandex dockerhub-proxy, because Docker Hub has enabled very restrictive rate limits #21551 (vzakaznikov).
Now it's not allowed to create MergeTree tables in old syntax with table TTL because it's just ignored. Attach of old tables is still possible. #20282 (alesapin).
Now all case-insensitive function names will be rewritten to their canonical representations. This is needed for projection query routing (the upcoming feature). #20174 (Amos Bird).
Fix creation of TTL in cases, when its expression is a function and it is the same as ORDER BY key. Now it's allowed to set custom aggregation to primary key columns in TTL with GROUP BY. Backward incompatible: For primary key columns, which are not in GROUP BY and aren't set explicitly now is applied function any instead of max, when TTL is expired. Also if you use TTL with WHERE or GROUP BY you can see exceptions at merges, while making rolling update. #15450 (Anton Popov).
Added implicit_key option for executable dictionary source. It allows to avoid printing key for every record if records comes in the same order as the input keys. Implements #14527. #19677 (Maksim Kita).
Tables with MergeTree* engine now have two new table-level settings for query concurrency control. Setting max_concurrent_queries limits the number of concurrently executed queries which are related to this table. Setting min_marks_to_honor_max_concurrent_queries tells to apply previous setting only if query reads at least this number of marks. #19544 (Amos Bird).
Added file function to read file from user_files directory as a String. This is different from the file table function. This implements #issue:18851. #19204 (keenwolf).
Add experimental Replicated database engine. It replicates DDL queries across multiple hosts. #16193 (tavplubix).
Introduce experimental support for window functions, enabled with allow_experimental_window_functions = 1. This is a preliminary, alpha-quality implementation that is not suitable for production use and will change in backward-incompatible ways in future releases. Please see the documentation for the list of supported features. #20337 (Alexander Kuzmenkov).
Hedged requests for remote queries. When setting use_hedged_requests enabled (off by default), allow to establish many connections with different replicas for query. New connection is enabled in case existent connection(s) with replica(s) were not established within hedged_connection_timeout or no data was received within receive_data_timeout. Query uses the first connection which send non empty progress packet (or data packet, if allow_changing_replica_until_first_data_packet); other connections are cancelled. Queries with max_parallel_replicas > 1 are supported. #19291 (Kruglov Pavel). This allows to significantly reduce tail latencies on very large clusters.
Added support for PREWHERE (and enable the corresponding optimization) when tables have row-level security expressions specified. #19576 (Denis Glazachev).
The setting distributed_aggregation_memory_efficient is enabled by default. It will lower memory usage and improve performance of distributed queries. #20599 (alexey-milovidov).
Speed up reading from Memory tables in extreme cases (when reading speed is in order of 50 GB/sec) by simplification of pipeline and (consequently) less lock contention in pipeline scheduling. #20468 (alexey-milovidov).
Partially reimplement HTTP server to make it making less copies of incoming and outgoing data. It gives up to 1.5 performance improvement on inserting long records over HTTP. #19516 (Ivan).
Add compress setting for Memory tables. If it's enabled the table will use less RAM. On some machines and datasets it can also work faster on SELECT, but it is not always the case. This closes #20093. Note: there are reasons why Memory tables can work slower than MergeTree: (1) lack of compression (2) static size of blocks (3) lack of indices and prewhere... #20168 (alexey-milovidov).
Do not squash blocks too much on INSERT SELECT if inserting into Memory table. In previous versions inefficient data representation was created in Memory table after INSERT SELECT. This closes #13052. #20169 (alexey-milovidov).
Fix at least one case when DataType parser may have exponential complexity (found by fuzzer). This closes #20096. #20132 (alexey-milovidov).
Parallelize SELECT with FINAL for single part with level > 0 when do_not_merge_across_partitions_select_final setting is 1. #19375 (Kruglov Pavel).
Fill only requested columns when querying system.parts and system.parts_columns. Closes #19570. #21035 (Anmol Arora).
Perform algebraic optimizations of arithmetic expressions inside avg aggregate function. close #20092. #20183 (flynn).
Case-insensitive compression methods for table functions. Also fixed LZMA compression method which was checked in upper case. #21416 (Vladimir Chebotarev).
Add two settings to delay or throw error during insertion when there are too many inactive parts. This is useful when server fails to clean up parts quickly enough. #20178 (Amos Bird).
Provide better compatibility for mysql clients. 1. mysql jdbc 2. mycli. #21367 (Amos Bird).
Forbid to drop a column if it's referenced by materialized view. Closes #21164. #21303 (flynn).
MySQL dictionary source will now retry unexpected connection failures (Lost connection to MySQL server during query) which sometimes happen on SSL/TLS connections. #21237 (Alexander Kazakov).
Usability improvement: more consistent DateTime64 parsing: recognize the case when unix timestamp with subsecond resolution is specified as scaled integer (like 1111111111222 instead of 1111111111.222). This closes #13194. #21053 (alexey-milovidov).
Do only merging of sorted blocks on initiator with distributed_group_by_no_merge. #20882 (Azat Khuzhin).
When loading config for mysql source ClickHouse will now randomize the list of replicas with the same priority to ensure the round-robin logics of picking mysql endpoint. This closes #20629. #20632 (Alexander Kazakov).
Function 'reinterpretAs(x, Type)' renamed into 'reinterpret(x, Type)'. #20611 (Maksim Kita).
Improved serialization for data types combined of Arrays and Tuples. Improved matching enum data types to protobuf enum type. Fixed serialization of the Map data type. Omitted values are now set by default. #20506 (Vitaly Baranov).
Fixed race between execution of distributed DDL tasks and cleanup of DDL queue. Now DDL task cannot be removed from ZooKeeper if there are active workers. Fixes #20016. #20448 (tavplubix).
Make FQDN and other DNS related functions work correctly in alpine images. #20336 (filimonov).
Do not allow early constant folding of explicitly forbidden functions. #20303 (Azat Khuzhin).
Implicit conversion from integer to Decimal type might succeeded if integer value doe not fit into Decimal type. Now it throws ARGUMENT_OUT_OF_BOUND. #20232 (tavplubix).
Updated CacheDictionary, ComplexCacheDictionary, SSDCacheDictionary, SSDComplexKeyDictionary to use LRUHashMap as underlying index. #20164 (Maksim Kita).
The setting access_management is now configurable on startup by providing CLICKHOUSE_DEFAULT_ACCESS_MANAGEMENT, defaults to disabled (0) which was the prior value. #20139 (Marquitos).
Fix toDateTime64(toDate()/toDateTime()) for DateTime64 - Implement DateTime64 clamping to match DateTime behaviour. #20131 (Azat Khuzhin).
Quota improvements: SHOW TABLES is now considered as one query in the quota calculations, not two queries. SYSTEM queries now consume quota. Fix calculation of interval's end in quota consumption. #20106 (Vitaly Baranov).
Supports path IN (set) expressions for system.zookeeper table. #20105 (小路).
Fix data race in executable dictionary that was possible only on misuse (when the script returns data ignoring its input). #20045 (alexey-milovidov).
The value of MYSQL_OPT_RECONNECT option can now be controlled by "opt_reconnect" parameter in the config section of mysql replica. #19998 (Alexander Kazakov).
If user calls JSONExtract function with Float32 type requested, allow inaccurate conversion to the result type. For example the number 0.1 in JSON is double precision and is not representable in Float32, but the user still wants to get it. Previous versions return 0 for non-Nullable type and NULL for Nullable type to indicate that conversion is imprecise. The logic was 100% correct but it was surprising to users and leading to questions. This closes #13962. #19960 (alexey-milovidov).
Add conversion of block structure for INSERT into Distributed tables if it does not match. #19947 (Azat Khuzhin).
Improvement for the system.distributed_ddl_queue table. Initialize MaxDDLEntryID to the last value after restarting. Before this PR, MaxDDLEntryID will remain zero until a new DDLTask is processed. #19924 (Amos Bird).
Add ability to throttle INSERT into Distributed based on amount of pending bytes for async send (bytes_to_delay_insert/max_delay_to_insert and bytes_to_throw_insert settings for Distributed engine has been added). #19673 (Azat Khuzhin).
Fix some rare cases when write errors can be ignored in destructors. #19451 (Azat Khuzhin).
Print inline frames in stack traces for fatal errors. #19317 (Ivan).
Fix the metadata leak when the Replicated*MergeTree with custom (non default) ZooKeeper cluster is dropped. #21119 (fastio).
Fix type mismatch issue when using LowCardinality keys in joinGet. This fixes #21114. #21117 (Amos Bird).
fix default_replica_path and default_replica_name values are useless on Replicated(*)MergeTree engine when the engine needs specify other parameters. #21060 (mxzlxy).
Out of bound memory access was possible when formatting specifically crafted out of range value of type DateTime64. This closes #20494. This closes #20543. #21023 (alexey-milovidov).
Block parallel insertions into storage join. #21009 (vdimir).
Fixed behaviour, when ALTER MODIFY COLUMN created mutation, that will knowingly fail. #21007 (Anton Popov).
Closes #9969. Fixed Brotli http compression error, which reproduced for large data sizes, slightly complicated structure and with json output format. Update Brotli to the latest version to include the "fix rare access to uninitialized data in ring-buffer". #20991 (Kseniia Sumarokova).
Fix 'Empty task was returned from async task queue' on query cancellation. #20881 (Azat Khuzhin).
USE database; query did not work when using MySQL 5.7 client to connect to ClickHouse server, it's fixed. Fixes #18926. #20878 (tavplubix).
Fix usage of -Distinct combinator with -State combinator in aggregate functions. #20866 (Anton Popov).
Fix the case when calculating modulo of division of negative number by small divisor, the resulting data type was not large enough to accomodate the negative result. This closes #20052. #20067 (alexey-milovidov).
MaterializeMySQL: Fix replication for statements that update several tables. #20066 (Håvard Kvålen).
Prevent "Connection refused" in docker during initialization script execution. #20012 (filimonov).
EmbeddedRocksDB is an experimental storage. Fix the issue with lack of proper type checking. Simplified code. This closes #19967. #19972 (alexey-milovidov).
Fix a segfault in function fromModifiedJulianDay when the argument type is Nullable(T) for any integral types other than Int32. #19959 (PHO).
Allow to build ClickHouse with AVX-2 enabled globally. It gives slight performance benefits on modern CPUs. Not recommended for production and will not be supported as official build for now. #20180 (alexey-milovidov).
Allow to start up with modified binary under gdb. In previous version if you set up breakpoint in gdb before start, server will refuse to start up due to failed integrity check. #21258 (alexey-milovidov).
Print stdout and stderr to log when failed to start docker in integration tests. Before this PR there was a very short error message in this case which didn't help to investigate the problems. #20631 (Vitaly Baranov).
Added PostgreSQL table engine (both select/insert, with support for multidimensional arrays), also as table function. Added PostgreSQL dictionary source. Added PostgreSQL database engine. #18554 (Kseniia Sumarokova).
Data type Nested now supports arbitrary levels of nesting. Introduced subcolumns of complex types, such as size0 in Array, null in Nullable, names of Tuple elements, which can be read without reading of whole column. #17310 (Anton Popov).
Added Nullable support for FlatDictionary, HashedDictionary, ComplexKeyHashedDictionary, DirectDictionary, ComplexKeyDirectDictionary, RangeHashedDictionary. #18236 (Maksim Kita).
Adds a new table called system.distributed_ddl_queue that displays the queries in the DDL worker queue. #17656 (Bharat Nallan).
Added support of mapping LDAP group names, and attribute values in general, to local roles for users from ldap user directories. #17211 (Denis Glazachev).
Support insert into table function cluster, and for both table functions remote and cluster, support distributing data across nodes by specify sharding key. Close #16752. #18264 (flynn).
Add function decodeXMLComponent to decode characters for XML. Example: SELECT decodeXMLComponent('Hello,"world"!')#17659. #18542 (nauta).
Function formatDateTime support the %Q modification to format date to quarter. #19224 (Jianmei Zhang).
Support MetaKey+Enter hotkey binding in play UI. #19012 (sundyli).
Add three functions for map data type: 1. mapContains(map, key) to check weather map.keys include the second parameter key. 2. mapKeys(map) return all the keys in Array format 3. mapValues(map) return all the values in Array format. #18788 (hexiaoting).
Faster parts removal by lowering the number of stat syscalls. This returns the optimization that existed while ago. More safe interface of IDisk. This closes #19065. #19086 (alexey-milovidov).
Aliases declared in WITH statement are properly used in index analysis. Queries like WITH column AS alias SELECT ... WHERE alias = ... may use index now. #18896 (Amos Bird).
Add optimize_alias_column_prediction (on by default), that will: - Respect aliased columns in WHERE during partition pruning and skipping data using secondary indexes; - Respect aliased columns in WHERE for trivial count queries for optimize_trivial_count; - Respect aliased columns in GROUP BY/ORDER BY for optimize_aggregation_in_order/optimize_read_in_order. #16995 (sundyli).
Speed up aggregate function sum. Improvement only visible on synthetic benchmarks and not very practical. #19216 (alexey-milovidov).
Support splitting Filter step of query plan into Expression + Filter pair. Together with Expression + Expression merging optimization (#17458) it may delay execution for some expressions after Filter step. #19253 (Nikolai Kochetov).
In distributed queries if the setting async_socket_for_remote is enabled, it was possible to get stack overflow at least in debug build configuration if very deeply nested data type is used in table (e.g. Array(Array(Array(...more...)))). This fixes #19108. This change introduces minor backward incompatibility: excessive parenthesis in type definitions no longer supported, example: Array((UInt8)). #19736 (alexey-milovidov).
Add an option to disable validation of checksums on reading. Should never be used in production. Please do not expect any benefits in disabling it. It may only be used for experiments and benchmarks. The setting only applicable for tables of MergeTree family. Checksums are always validated for other table engines and when receiving data over network. In my observations there is no performance difference or it is less than 0.5%. #19588 (alexey-milovidov).
The exception when function bar is called with certain NaN argument may be slightly misleading in previous versions. This fixes #19088. #19107 (alexey-milovidov).
Explicitly set uid / gid of clickhouse user & group to the fixed values (101) in clickhouse-server images. #19096 (filimonov).
Fixed PeekableReadBuffer: Memory limit exceed error when inserting data with huge strings. Fixes #18690. #18979 (tavplubix).
Docker image: several improvements for clickhouse-server entrypoint. #18954 (filimonov).
Add normalizeQueryKeepNames and normalizedQueryHashKeepNames to normalize queries without masking long names with ?. This helps better analyze complex query logs. #18910 (Amos Bird).
Check per-block checksum of the distributed batch on the sender before sending (without reading the file twice, the checksums will be verified while reading), this will avoid stuck of the INSERT on the receiver (on truncated .bin file on the sender). Avoid reading .bin files twice for batched INSERT (it was required to calculate rows/bytes to take squashing into account, now this information included into the header, backward compatible is preserved). #18853 (Azat Khuzhin).
Fix issues with RIGHT and FULL JOIN of tables with aggregate function states. In previous versions exception about cloneResized method was thrown. #18818 (templarzq).
Fix issue with bitmapOrCardinality that may lead to nullptr dereference. This closes #18911. #18912 (sundyli).
Fixed Attempt to read after eof error when trying to CASTNULL from Nullable(String) to Nullable(Decimal(P, S)). Now function CAST returns NULL when it cannot parse decimal from nullable string. Fixes #7690. #18718 (Winter Zhang).
Check settings constraints for profile settings from config. Server will fail to start if users.xml contain settings that do not meet corresponding constraints. #18486 (tavplubix).
Restrict ALTER MODIFY SETTING from changing storage settings that affects data parts (write_final_mark and enable_mixed_granularity_parts). #18306 (Amos Bird).
Set insert_quorum_parallel to 1 by default. It is significantly more convenient to use than "sequential" quorum inserts. But if you rely to sequential consistency, you should set the setting back to zero. #17567 (alexey-milovidov).
Removed aggregate functions timeSeriesGroupSum, timeSeriesGroupRateSum because a friend of mine said they never worked. This fixes #16869. If you have luck using these functions, write a email to [email protected]#17423 (alexey-milovidov).
Prohibit toUnixTimestamp(Date()) (before it just returns UInt16 representation of Date). #17376 (Azat Khuzhin).
Allow using extended integer types (Int128, Int256, UInt256) in avg and avgWeighted functions. Also allow using different types (integer, decimal, floating point) for value and for weight in avgWeighted function. This is a backward-incompatible change: now the avg and avgWeighted functions always return Float64 (as documented). Before this change the return type for Decimal arguments was also Decimal. #15419 (Mike).
Expression toUUID(N) no longer works. Replace with toUUID('00000000-0000-0000-0000-000000000000'). This change is motivated by non-obvious results of toUUID(N) where N is non zero.
SSL Certificates with incorrect "key usage" are rejected. In previous versions they are used to work. See #19262.
incl references to substitutions file (/etc/metrika.xml) were removed from the default config (<remote_servers>, <zookeeper>, <macros>, <compression>, <networks>). If you were using substitutions file and were relying on those implicit references, you should put them back manually and explicitly by adding corresponding sections with incl="..." attributes before the update. See #18740 (alexey-milovidov).
Implemented REPLACE TABLE and CREATE OR REPLACE TABLE queries. #18521 (tavplubix).
Implement UNION DISTINCT and treat the plain UNION clause as UNION DISTINCT by default. Add a setting union_default_mode that allows to treat it as UNION ALL or require explicit mode specification. #16338 (flynn).
Added function accurateCastOrNull. This closes #10290. Add type conversions in x IN (subquery) expressions. This closes #10266. #16724 (Maksim Kita).
IP Dictionary supports IPv4 / IPv6 types directly. #17571 (vdimir).
Add *.zst compression/decompression support for data import and export. It enables using *.zst in file() function and Content-encoding: zstd in HTTP client. This closes #16791 . #17144 (Abi Palagashvili).
Added mannWitneyUTest, studentTTest and welchTTest aggregate functions. Refactored rankCorr a bit. #16883 (Nikita Mikhaylov).
Implement countSubstrings()/countSubstringsCaseInsensitive()/countSubstringsCaseInsensitiveUTF8() (Count the number of substring occurrences). #17347 (Azat Khuzhin).
Add information about used databases, tables and columns in system.query_log. Add query_kind and normalized_query_hash fields. #17726 (Amos Bird).
Add a setting optimize_on_insert. When enabled, do the same transformation for INSERTed block of data as if merge was done on this block (e.g. Replacing, Collapsing, Aggregating...). This setting is enabled by default. This can influence Materialized View and MaterializeMySQL behaviour (see detailed description). This closes #10683. #16954 (Kruglov Pavel).
Support SHOW SETTINGS statement to show parameters in system.settings. SHOW CHANGED SETTINGS and LIKE/ILIKE clause are also supported. #18056 (Jianmei Zhang).
Function position now supports POSITION(needle IN haystack) synax for SQL compatibility. This closes #18701. ... #18779 (Jianmei Zhang).
Now we have a new storage setting max_partitions_to_read for tables in the MergeTree family. It limits the max number of partitions that can be accessed in one query. A user setting force_max_partition_limit is also added to enforce this constraint. #18712 (Amos Bird).
Add query_id column to system.part_log for inserted parts. Closes #10097. #18644 (flynn).
Allow create table as select with columns specification. Example CREATE TABLE t1 (x String) ENGINE = Memory AS SELECT 1;. #18060 (Maksim Kita).
Implemented ATTACH TABLE name FROM 'path/to/data/' (col1 Type1, ... query. It creates new table with provided structure and attaches table data from provided directory in user_files. #17903 (tavplubix).
Add a new setting insert_distributed_one_random_shard = 1 to allow insertion into multi-sharded distributed table without any distributed key. #18294 (Amos Bird).
Add settings min_compress_block_size and max_compress_block_size to MergeTreeSettings, which have higher priority than the global settings and take effect when they are set. close 13890. #17867 (flynn).
Extended OPTIMIZE ... DEDUPLICATE syntax to allow explicit (or implicit with asterisk/column transformers) list of columns to check for duplicates on. ... #17846 (Vasily Nemkov).
Added functions toModifiedJulianDay, fromModifiedJulianDay, toModifiedJulianDayOrNull, and fromModifiedJulianDayOrNull. These functions convert between Proleptic Gregorian calendar date and Modified Julian Day number. #17750 (PHO).
Add ability to use custom TLD list: added functions firstSignificantSubdomainCustom, cutToFirstSignificantSubdomainCustom. #17748 (Azat Khuzhin).
Add support for PROXYv1 protocol to wrap native TCP interface. Allow quotas to be keyed by proxy-forwarded IP address (applied for PROXYv1 address and for X-Forwarded-For from HTTP interface). This is useful when you provide access to ClickHouse only via trusted proxy (e.g. CloudFlare) but want to account user resources by their original IP addresses. This fixes #17268. #17707 (alexey-milovidov).
Now clickhouse-client supports opening EDITOR to edit commands. Alt-Shift-E. #17665 (Amos Bird).
Add function encodeXMLComponent to escape characters to place string into XML text node or attribute. #17659 (nauta).
Introduce DETACH TABLE/VIEW ... PERMANENTLY syntax, so that after restarting the table does not reappear back automatically on restart (only by explicit request). The table can still be attached back using the short syntax ATTACH TABLE. Implements #5555. Fixes #13850. #17642 (filimonov).
Add asynchronous metrics on total amount of rows, bytes and parts in MergeTree tables. This fix #11714. #17639 (flynn).
Add settings limit and offset for out-of-SQL pagination: #16176 They are useful for building APIs. These two settings will affect SELECT query as if it is added like select * from (your_original_select_query) t limit xxx offset xxx;. #17633 (hexiaoting).
Provide a new aggregator combinator : -SimpleState to build SimpleAggregateFunction types via query. It's useful for defining MaterializedView of AggregatingMergeTree engine, and will benefit projections too. #16853 (Amos Bird).
Added queries-file parameter for clickhouse-client and clickhouse-local. #15930 (Maksim Kita).
Added functions for calculation of minHash and simHash of text n-grams and shingles. They are intended for semi-duplicate search. Also functions bitHammingDistance and tupleHammingDistance are added. #7649 (flynn).
Add new data type Map. See #1841. First version for Map only supports String type of key and value. #15806 (hexiaoting).
Implement alternative SQL parser based on ANTLR4 runtime and generated from EBNF grammar. #11298 (Ivan).
LDAP integration: Added verification_cooldown parameter in LDAP server connection configuration to allow caching of successful "bind" attempts for configurable period of time. #15988 (Denis Glazachev).
Add --no-system-table option for clickhouse-local to run without system tables. This avoids initialization of DateLUT that may take noticeable amount of time (tens of milliseconds) at startup. #18899 (alexey-milovidov).
Replace PODArray with PODArrayWithStackMemory in AggregateFunctionWindowFunnelData to improve windowFunnel function performance. #18817 (flynn).
All queries of type Decimal * Float or vice versa are allowed, including aggregate ones (e.g. SELECT sum(decimal_field * 1.1) or SELECT dec_col * float_col), the result type is Float32 or Float64. #18145 (Mike).
Improved minimal Web UI: add history; add sharing support; avoid race condition of different requests; add request in-flight and ready indicators; add favicon; detect Ctrl+Enter if textarea is not in focus. #17293#17770 (alexey-milovidov).
clickhouse-server didn't send close request to ZooKeeper server. #16837 (alesapin).
Avoid server abnormal termination in case of too low memory limits (max_memory_usage = 1 / max_untracked_memory = 1). #17453 (Azat Khuzhin).
Fix non-deterministic result of windowFunnel function in case of same timestamp for different events. #18884 (Fuwang Hu).
Docker: Explicitly set uid / gid of clickhouse user & group to the fixed values (101) in clickhouse-server Docker images. #19096 (filimonov).
Asynchronous INSERTs to Distributed tables: Two new settings (by analogy with MergeTree family) has been added: - fsync_after_insert - Do fsync for every inserted. Will decreases performance of inserts. - fsync_directories - Do fsync for temporary directory (that is used for async INSERT only) after all operations (writes, renames, etc.). #18864 (Azat Khuzhin).
Expand macros in the zk path when executing FETCH PARTITION. #18839 (fastio).
Apply ALTER TABLE <replicated_table> ON CLUSTER MODIFY SETTING ... to all replicas. Because we don't replicate such alter commands. #18789 (Amos Bird).
Allow column transformer EXCEPT to accept a string as regular expression matcher. This resolves #18685 . #18699 (Amos Bird).
Fix SimpleAggregateFunction in SummingMergeTree. Now it works like AggregateFunction. In previous versions values were summed together regardless to the aggregate function. This fixes #18564 . #8052. #18637 (Amos Bird). Another fix of using SimpleAggregateFunction in SummingMergeTree. This fixes #18676 . #18677 (Amos Bird).
Fixed assertion error inside allocator in case when last argument of function bar is NaN. Now simple ClickHouse's exception is being thrown. This fixes #17876. #18520 (Nikita Mikhaylov).
Add ability to modify primary and partition key column type from LowCardinality(Type) to Type and vice versa. Also add an ability to modify primary key column type from EnumX to IntX type. Fixes #5604. #18362 (alesapin).
Allow to parse Array fields from CSV if it is represented as a string containing array that was serialized as nested CSV. Example: "[""Hello"", ""world"", ""42"""" TV""]" will parse as ['Hello', 'world', '42" TV']. Allow to parse array in CSV in a string without enclosing braces. Example: "'Hello', 'world', '42"" TV'" will parse as ['Hello', 'world', '42" TV']. #18271 (alexey-milovidov).
Make better adaptive granularity calculation for merge tree wide parts. #18223 (alesapin).
Now clickhouse install could work on Mac. The problem was that there is no procfs on this platform. #18201 (Nikita Mikhaylov).
Access control: Now table function merge() requires current user to have SELECT privilege on each table it receives data from. This PR fixes #16964. #18104#17983 (Vitaly Baranov).
Temporary tables are visible in the system tables system.tables and system.columns now only in those session where they have been created. The internal database _temporary_and_external_tables is now hidden in those system tables; temporary tables are shown as tables with empty database with the is_temporary flag set instead. #18014 (Vitaly Baranov).
Fix clickhouse-client rendering issue when the size of terminal window changes. #18009 (Amos Bird).
Decrease log verbosity of the events when the client drops the connection from Warning to Information. #18005 (filimonov).
Forcibly removing empty or bad metadata files from filesystem for DiskS3. S3 is an experimental feature. #17935 (Pavel Kovalenko).
Access control: allow_introspection_functions=0 prohibits usage of introspection functions but doesn't prohibit giving grants for them anymore (the grantee will need to set allow_introspection_functions=1 for himself to be able to use that grant). Similarly allow_ddl=0 prohibits usage of DDL commands but doesn't prohibit giving grants for them anymore. #17908 (Vitaly Baranov).
Add diagnostic information when two merge tables try to read each other's data. #17854 (徐炘).
Let the possibility to override timeout value for running script using the ClickHouse docker image. #17818 (Guillaume Tassery).
Check system log tables' engine definition grammar to prevent some configuration errors. Notes that this grammar check is not semantical, that means such mistakes as non-existent columns / expression functions would be not found out util the table is created. #17739 (Du Chuan).
Removed exception throwing at RabbitMQ table initialization if there was no connection (it will be reconnecting in the background). #17709 (Kseniia Sumarokova).
Now queries coming to the server via MySQL and PostgreSQL protocols have distinctive interface types (which can be seen in the interface column of the tablesystem.query_log): 4 for MySQL, and 5 for PostgreSQL, instead of formerly used 1 which is now used for the native protocol only. #17437 (Vitaly Baranov).
Fix parsing of SETTINGS clause of the INSERT ... SELECT ... SETTINGS query. #17414 (Azat Khuzhin).
Cache dictionaries: Completely eliminate callbacks and locks for acquiring them. Keys are not divided into "not found" and "expired", but stored in the same map during query. #14958 (Nikita Mikhaylov).
Fix never worked fsync_part_directory/fsync_after_insert/in_memory_parts_insert_sync (experimental feature). #18845 (Azat Khuzhin).
Allow using Atomic engine for nested database of MaterializeMySQL engine. #14849 (tavplubix).
Fix the issue when server can stop accepting connections in very rare cases. #17542 (Amos Bird, alexey-milovidov).
Fix index analysis of binary functions with constant argument which leads to wrong query results. This fixes #18364. #18373 (Amos Bird).
Fix possible wrong index analysis when the types of the index comparison are different. This fixes #17122. #17145 (Amos Bird).
Disable write with AIO during merges because it can lead to extremely rare data corruption of primary key columns during merge. #18481 (alesapin).
Restrict merges from wide to compact parts. In case of vertical merge it led to broken result part. #18381 (Anton Popov).
Fix possible incomplete query result while reading from MergeTree* in case of read backoff (message <Debug> MergeTreeReadPool: Will lower number of threads in logs). Was introduced in #16423. Fixes #18137. #18216 (Nikolai Kochetov).
Fix the issue that asynchronous distributed INSERTs can be rejected by the server if the setting network_compression_method is globally set to non-default value. This fixes #18741. #18776 (alexey-milovidov).
Fixed Attempt to read after eof error when trying to CASTNULL from Nullable(String) to Nullable(Decimal(P, S)). Now function CAST returns NULL when it cannot parse decimal from nullable string. Fixes #7690. #18718 (Winter Zhang).
Fixed value is too short error when executing toType(...) functions (toDate, toUInt32, etc) with argument of type Nullable(String). Now such functions return NULL on parsing errors instead of throwing exception. Fixes #7673. #18445 (tavplubix).
Fix the unexpected behaviour of SHOW TABLES. #18431 (fastio).
Fix -SimpleState combinator generates incompatible arugment type and return type. #18404 (Amos Bird).
Fix possible race condition in concurrent usage of Set or Join tables and selects from system.tables. #18385 (alexey-milovidov).
Fix possible crashes in aggregate functions with combinator Distinct, while using two-level aggregation. Fixes #17682. #18365 (Anton Popov).
Fixed issue when clickhouse-odbc-bridge process is unreachable by server on machines with dual IPv4/IPv6 stack; Fixed issue when ODBC dictionary updates are performed using malformed queries and/or cause crashes of the odbc-bridge process; Possibly closes #14489. #18278 (Denis Glazachev).
Access control: SELECT count() FROM table now can be executed if the user has access to at least single column from a table. This PR fixes #10639. #18233 (Vitaly Baranov).
Access control: SELECT JOIN now requires the SELECT privilege on each of the joined tables. This PR fixes #17654. #18232 (Vitaly Baranov).
Generate build id when ClickHouse is linked with lld. It's appeared that lld does not generate it by default on my machine. Build id is used for crash reports and introspection. #18808 (alexey-milovidov).
Add a library that replaces some libc functions to traps that will terminate the process. #16366 (alexey-milovidov).
Provide diagnostics in server logs in case of stack overflow, send error message to clickhouse-client. This closes #14840. #16346 (alexey-milovidov).
Now we can run almost all stateless functional tests in parallel. #15236 (alesapin).
Fix corruption in librdkafka snappy decompression (was a problem only for gcc10 builds, but official builds uses clang already, so at least recent official releases are not affected). #18053 (Azat Khuzhin).