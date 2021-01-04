2024 Changelog
Table of Contents
ClickHouse release v24.12, 2024-12-19
ClickHouse release v24.11, 2024-11-26
ClickHouse release v24.10, 2024-10-31
ClickHouse release v24.9, 2024-09-26
ClickHouse release v24.8 LTS, 2024-08-20
ClickHouse release v24.7, 2024-07-30
ClickHouse release v24.6, 2024-07-01
ClickHouse release v24.5, 2024-05-30
ClickHouse release v24.4, 2024-04-30
ClickHouse release v24.3 LTS, 2024-03-26
ClickHouse release v24.2, 2024-02-29
ClickHouse release v24.1, 2024-01-30
Changelog for 2023
ClickHouse release 24.12, 2024-12-19
Backward Incompatible Change
- Functions
greatestand
leastnow ignore NULL input values, whereas they previously returned NULL if one of the arguments was NULL. For example,
SELECT greatest(1, 2, NULL)now returns 2. This makes the behavior compatible with PostgreSQL, but at the same time it breaks the compatibility with MySQL which returns NULL. To retain the previous behavior, set setting
least_greatest_legacy_null_behavior(default:
false) to
true. #65519 #73344 (kevinyhzou).
- A new MongoDB integration is now the default. Users who like to use the legacy MongoDB driver (based on the Poco driver) can enable server setting
use_legacy_mongodb_integration. #73359 (Kirill Nikiforov.
New Feature
- Move
JSON/
Dynamic/
Varianttypes from experimental features to beta. #72294 (Pavel Kruglov). We also backported all fixes as well as this change to 24.11.
- Schema evolution for the Iceberg data storage format provides the user with extensive options for modifying the schema of their table. The order of columns, column names, and simple type extensions can be changed under the hood. #69445 (Daniil Ivanik).
- Integrate with Iceberg REST Catalog: a new database engine, named Iceberg, which plugs the whole catalog into ClickHouse. #71542 (Kseniia Sumarokova).
- Added cache for primary index of
MergeTreetables (can be enabled by table setting
use_primary_key_cache). If lazy load and cache are enabled for primary index, it will be loaded to cache on demand (similar to mark cache) instead of keeping it in memory forever. Added prewarm of primary index on inserts/mergs/fetches of data parts and on restarts of table (can be enabled by setting
prewarm_primary_key_cache). This allows lower memory usage for huge tables on shared storage, and we tested it on tables over one quadrillion records. #72102 (Anton Popov). #72750 (Alexander Gololobov).
- Implement
SYSTEM LOAD PRIMARY KEYcommand to load primary indexes for all parts of a specified table or for all tables if no table is specified. This will be useful for benchmarks and to prevent extra latency during query execution. #66252 #67733 (ZAWA_ll).
- Added a query that allows to attach
MergeTreetables as
ReplicatedMergeTreeand vice versa:
ATTACH TABLE ... AS REPLICATEDand
ATTACH TABLE ... AS NOT REPLICATED. #65401 (Kirill).
- A new setting,
http_response_headerswhich allows you to customize the HTTP response headers. For example, you can tell the browser to render a picture that is stored in the database. This closes #59620. #72656 (Alexey Milovidov).
- Add function
toUnixTimestamp64Secondwhich converts a
DateTime64to a
Int64value with fixed second precision, so we can support return negative value if date is before the unix epoch. #70597 (zhanglistar). #73146 (Robert Schulze).
- Add new setting
enforce_index_structure_match_on_partition_manipulationto allow attach when the set of source table's projections and secondary indices is a subset of those in the target table. Close #70602. #70603 (zwy991114).
- Add syntax ALTER USER
{ADD|MODIFY|DROP SETTING}, ALTER USER
{ADD|DROP PROFILE}, the same for ALTER ROLE and ALTER PROFILE. So instead of replacing all the set of settings, you can modify it. #72050 (pufit).
- Added
arrayPRAUCfunction, which calculates the AUC (Area Under the Curve) for the Precision Recall curve. #72073 (Emmanuel).
- Add
indexOfAssumeSortedfunction for array types. Optimizes the search in the case of a sorted in non-decreasing order array. The effect appears on very large arrays (over 100,000 elements). #72517 (Eric Kurbanov).
- Allows to use a delimiter as an optional second argument for aggregate function
groupConcat. #72540 (Yarik Briukhovetskyi).
- Function
translatenow supports character deletion if the
fromargument contains more characters than the
toargument. Example:
SELECT translate('clickhouse', 'clickhouse', 'CLICK')now returns
CLICK. #71441 (shuai.xu).
Experimental Features
- A new MergeTree setting
allow_experimental_reverse_keythat enables support for descending sort order in MergeTree sorting keys. This is useful for time series analysis, especially TopN queries. Example usage:
ENGINE = MergeTree ORDER BY (time DESC, key)- descending order for the
timefield. #71095 (Amos Bird).
Performance Improvement
- JOIN reordering. Added an option to select the side of the join that will act as the inner (build) table in the query plan. This is controlled by
query_plan_join_swap_table, which can be set to
auto. In this mode, ClickHouse will try to choose the table with the smallest number of rows. #71577 (Vladimir Cherkasov).
- Now
parallel_hashalgorithm will be used (if applicable) when the
join_algorithmsetting is set to
default. Two previous alternatives (
directand
hash) are still considered when
parallel_hashcannot be used. #70788 (Nikita Taranov).
- Add option to extract common expressions from
WHEREand
ONexpressions in order to reduce the number of hash tables used during joins. This makes sense when the JOIN ON condition has common parts inside AND in different OR parts. Can be enabled by
optimize_extract_common_expressions = 1. #71537 (János Benjamin Antal).
- Allows to use indexes on
SELECTwhen an indexed column is CAST into a
LowCardinality(String), which could be the case when a query run over a Merge table with some tables having
Stringand some
LowCardinality(String). #71598 (Yarik Briukhovetskyi).
- During query execution with parallel replicas and enabled local plan, do not perform index analysis on workers. The coordinator will choose ranges to read for workers based on index analysis on its side (on the query initiator). This makes short queries with parallel replicas have as low latency as single-node queries. #72109 (Igor Nikonov).
- Memory usage of
clickhouse disks remove --recursiveis reduced for object storage disks. #67323 (Kirill).
- Bring back optimization for reading subcolumns of single column in compact parts from #57631. It was deleted accidentally. #72285 (Pavel Kruglov).
- Speedup sorting of
LowCardinality(String)columns by de-virtualizing calls in comparator. #72337 (Alexander Gololobov).
- Optimize function
argMin/
argMaxfor some simple data types. #72350 (alesapin).
- Optimize locking with shared locks in the memory tracker to reduce lock contention, which improves performance on systems with a very high number of CPU. #72375 (Jiebin Sun).
- Add a new setting,
use_async_executor_for_materialized_views. Use async and potentially multi-threaded execution of materialized view query, can speedup views processing during INSERT, but also consumes more memory. #72497 (alesapin).
- Improved performance of deserialization of states of aggregate functions (in data type
AggregateFunctionand in distributed queries). Slightly improved performance of parsing of format
RowBinary. #72818 (Anton Popov).
- Split ranges in reading with parallel replicas in the order of the table's key to consume less memory during reading. #72173 (JIaQi).
- Speed up insertions into merge tree in the case of a single value of partition key inside the inserted batch. #72348 (alesapin).
- Implement creating tables in parallel while restoring from a backup. Before this PR the
RESTOREcommand always created tables in one thread, which could be slow in case of backups containing many tables. #72427 (Vitaly Baranov).
- Dropping mark cache might take noticeable time if it is big. If we hold context mutex during this it block many other activities, even new client connection cannot be established until it is released. And holding this mutex is not actually required for synchronization, it is enough to have a local reference to the cache via shared ptr. #72749 (Alexander Gololobov).
Improvement
- Remove the
allow_experimental_join_conditionsetting, allowing non-equi conditions by default. #69910 (Vladimir Cherkasov).
- Settings from server config (users.xml) now apply on the client too. Useful for format settings, e.g.
date_time_output_format. #71178 (Michael Kolupaev).
- Automatic
GROUP BY/
ORDER BYto disk based on the server/user memory usage. Controlled with
max_bytes_ratio_before_external_group_by/
max_bytes_ratio_before_external_sortquery settings. #71406 (Azat Khuzhin).
- Adding a new cancellation logic:
CancellationCheckerchecks timeouts for every started query and stops them once the timeout has reached. #69880 (Yarik Briukhovetskyi).
- Support ALTER from
Objectto
JSON, which means you can easily migrate from the deprecated Object type. #71784 (Pavel Kruglov).
- Allow unknown values in set that are not present in Enum. Fix #72662. #72686 (zhanglistar).
- Support string search operator (e.g., LIKE) for
Enumdata type, implements #72661. #72732 (zhanglistar).
- Some meaningless ALTER USER queries were accepted. Fixes #71227. #71286 (Arthur Passos).
- Respect
prefer_locahost_replicawhen building plan for distributed
INSERT ... SELECT. #72190 (filimonov).
- Azure violated the Iceberg specification, mistakenly labeling Iceberg v1 as Iceberg v2. The problem is described here. Azure Iceberg Writer creates Iceberg metadata files (as well as manifest files) that violate specs. Now we attempt to read v1 Iceberg format metadata with the v2 reader (cause they write it in a this way), and added error when they didn't create corresponding fields in a manifest file. #72277 (Daniil Ivanik).
- Now it's allowed to
CREATE MATERIALIZED VIEWwith
UNION [ALL]in query. Behavior is the same as for matview with
JOIN: only the first table in
SELECTexpression will work as a trigger for insert, all other tables will be ignored. However, if there are many references to the first table (e.g., UNION with itself), all of them will be processed as the inserted block of data. #72347 (alesapin).
- Added source query validation when ClickHouse is used as a source for a dictionary. #72548 (Alexey Katsman).
- Ensure that ClickHouse will see ZooKeeper changes on config reloads. #72593 (Azat Khuzhin).
- Better memory usage approximation of cached marks to reduce total memory usage of the cache. #72630 (Antonio Andelic).
- Add a new
StartupScriptsExecutionStatemetric. The metric can have three values: 0 = startup scripts have not finished yet, 1 = startup scripts executed successfully, 2 = startup scripts failed. We need this metric because we need to know if startup scripts are being executed successfully in the cloud, especially after releases to base configurations. #72637 (Miсhael Stetsyuk).
- Add the new
MergeTreeIndexGranularityInternalArraysTotalSizemetric to
system.metrics. This metric is needed to find the instances with huge datasets susceptible to the high
- Add retries to creating a replicated table. #72682 (Vitaly Baranov).
- Add
total_bytes_with_inactiveto
system.tablesto count the total bytes of inactive parts. #72690 (Kai Zhu).
- Add MergeTree settings to
system.settings_changes. #72694 (Raúl Marín).
- Support JSON type in the
notEmptyfunction. #72741 (Pavel Kruglov).
- Support parsing GCS S3 error
AuthenticationRequired. #72753 (Vitaly Baranov).
- Support
Dynamictype in functions
ifNulland
coalesce. #72772 (Pavel Kruglov).
- Support
Dynamicin functions
toFloat64/
touInt32/etc. #72989 (Pavel Kruglov).
- Add S3 request settings
http_max_fields,
http_max_field_name_size,
http_max_field_value_sizeand use them while parsing S3 API responses during making a backup or restoring. #72778 (Vitaly Baranov).
- Delete table metadata in keeper in Storage S3(Azure)Queue only after last table using this metadata was dropped. #72810 (Kseniia Sumarokova).
- Added
JoinBuildTableRowCount/
JoinProbeTableRowCount/JoinResultRowCountprofile events. #72842 (Vladimir Cherkasov).
- Support subcolumns in MergeTree sorting key and skip indexes. #72644 (Pavel Kruglov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix possible intersecting parts for MergeTree (after an operation of moving part to the detached directory has been failed, possibly due to operation on object storage). #70476 (Azat Khuzhin).
- Fixes an error detection when a table name is too long. Provide a diagnostic telling the maximum length. Add a new function
getMaxTableNameLengthForDatabase. #70810 (Yarik Briukhovetskyi).
- Fixed zombie processes after a crash of
clickhouse-library-bridge(this program allows to run unsafe libraries). #71301 (MikhailBurdukov).
- Fix NoSuchKey error during transaction rollback when creating a directory fails for the
plain_rewritabledisk. #71439 (Julia Kartseva).
- Fix serialization of
Dynamicvalues in
PrettyJSON formats. #71923 (Pavel Kruglov).
- Add inferred format name to create query in
File/
S3/
URL/
HDFS/
Azureengines. Previously the format name was inferred each time the server was restarted, and if the specified data files were removed, it led to errors during server startup. #72108 (Pavel Kruglov).
- Fix bugs when using a UDF in join on expression with the old analyzer. #72179 (Raúl Marín).
- Fixes some small bugs in
StorageObjectStorage. Needs to enable
use_hive_partitioningby default. #72185 (Yarik Briukhovetskyi).
- Fix a bug where
min_age_to_force_merge_on_partition_onlywas getting stuck trying to merge down the same partition repeatedly that was already merged to a single part and not merging partitions that had multiple parts. #72209 (Christoph Wurm).
- Fixed a crash in
SimpleSquashingChunksTransformthat occurred in rare cases when processing sparse columns. #72226 (Vladimir Cherkasov).
- Fixed data race in
GraceHashJoinas the result of which some rows might be missing in the join output. #72233 (Nikita Taranov).
- Fixed
ALTER DELETEqueries with materialized
_block_numbercolumn (if setting
enable_block_number_columnis enabled). #72261 (Anton Popov).
- Fixed data race when
ColumnDynamic::dumpStructure()is called concurrently e.g., in
ConcurrentHashJoinconstructor. #72278 (Nikita Taranov).
- Fix possible
LOGICAL_ERRORwith duplicate columns in
ORDER BY ... WITH FILL. #72387 (Vladimir Cherkasov).
- Fixed mismatched types in several cases after applying
optimize_functions_to_subcolumns. #72394 (Anton Popov).
- Use
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILEinstead of
AWS_CONTAINER_AUTHORIZATION_TOKEN_PATH. Fixes #71074. #72397 (Konstantin Bogdanov).
- Fix failure on parsing
BACKUP DATABASE db EXCEPT TABLES db.tablequeries. #72429 (Konstantin Bogdanov).
- Don't allow creating empty
Variant. #72454 (Pavel Kruglov).
- Fix invalid formatting of
result_part_pathin
system.merges. #72567 (Konstantin Bogdanov).
- Fix parsing a glob with one element (such as
{file}). #72572 (Konstantin Bogdanov).
- Fix query generation for the follower server in case of a distributed query with
ARRAY JOIN. Fixes #69276. #72608 (Dmitry Novik).
- Fix a bug when DateTime64 IN DateTime64 returns nothing. #72640 (Yarik Briukhovetskyi).
- Fixed inconsistent metadata when adding a new replica to a Replicated database that has a table created with
flatten_nested=0. #72685 (Alexander Tokmakov).
- Fix advanced SSL configuration for Keeper's internal communication. #72730 (Antonio Andelic).
- Fix "No such key" error in S3Queue unordered mode with
tracked_files_limitsetting smaller than s3 files appearance rate. #72738 (Kseniia Sumarokova).
- Fix exception thrown in RemoteQueryExecutor when a user does not exist locally. #72759 (Andrey Zvonov).
- Fixed mutations with materialized
_block_numbercolumn (if setting
enable_block_number_columnis enabled). #72854 (Anton Popov).
- Fix backup/restore with plain rewritable disk in case there are empty files in backup. #72858 (Kseniia Sumarokova).
- Properly cancel inserts in DistributedAsyncInsertDirectoryQueue. #72885 (Antonio Andelic).
- Fixed crash while parsing of incorrect data into sparse columns (can happen with enabled setting
enable_parsing_to_custom_serialization). #72891 (Anton Popov).
- Fix potential crash during backup restore. #72947 (Kseniia Sumarokova).
- Fixed bug in
parallel_hashJOIN method that might appear when query has complex condition in the
ONclause with inequality filters. #72993 (Nikita Taranov).
- Use default format settings during JSON parsing to avoid broken deserialization. #73043 (Pavel Kruglov).
- Fix crash in transactions with unsupported storage. #73045 (Raúl Marín).
- Fix possible overestimation of memory tracking (when the difference between
MemoryTrackingand
MemoryResidentkept growing). #73081 (Azat Khuzhin).
- Check for duplicate JSON keys during Tuple parsing. Previously it could lead to a logical error
Invalid number of rows in Chunkduring parsing. #73082 (Pavel Kruglov).
Build/Testing/Packaging Improvement
- All small utilities previously stored in
/utilsfolder and required manual compilation from sources are now a part of main ClickHouse bundle. This closes: #72404. #72426 (Nikita Mikhaylov).
- Get rid of
/etc/systemd/system/clickhouse-server.serviceremoval introduced in 22.3 #39323. #72259 (Mikhail f. Shiryaev).
- Split large translation units to avoid compilation failures due to memory/cpu limitations. #72352 (Yakov Olkhovskiy).
- OSX: Build with ICU support, which enables collations, charset conversions and other localization features. #73083 (Raúl Marín).
ClickHouse release 24.11, 2024-11-26
Backward Incompatible Change
- Remove system tables
generate_seriesand
generateSeries. They were added by mistake here: #59390. #71091 (Alexey Milovidov).
- Remove
StorageExternalDistributed. Closes #70600.#71176 (flynn).
- The table engines Kafka, NATS and RabbitMQ are now covered by their own grants in the
SOURCEShierarchy. Add grants to any non-default database users that create tables with these engine types. #71250 (Christoph Wurm).
- Check the full mutation query before executing it (including subqueries). This prevents accidentally running an invalid query and building up dead mutations that block valid mutations. #71300 (Christoph Wurm).
- Rename filesystem cache setting
skip_download_if_exceeds_query_cacheto
filesystem_cache_skip_download_if_exceeds_per_query_cache_write_limit. #71578 (Kseniia Sumarokova).
- Remove support for
Enumas well as
UInt128and
UInt256arguments in
deltaSumTimestamp. Remove support for
Int8,
UInt8,
Int16, and
UInt16of the second ("timestamp") argument of
deltaSumTimestamp. #71790 (Alexey Milovidov).
- When retrieving data directly from a dictionary using Dictionary storage, dictionary table function, or direct SELECT from the dictionary itself, it is now enough to have
SELECTpermission or
dictGetpermission for the dictionary. This aligns with previous attempts to prevent ACL bypasses: https://github.com/ClickHouse/ClickHouse/pull/57362 and https://github.com/ClickHouse/ClickHouse/pull/65359. It also makes the latter one backward compatible. #72051 (Nikita Mikhaylov).
Experimental feature
- Implement
allow_feature_tieras a global switch to disable all experimental / beta features. #71841 #71145 (Raúl Marín).
- Fix possible error
No such file or directorydue to unescaped special symbols in files for JSON subcolumns. #71182 (Pavel Kruglov).
- Support alter from String to JSON. This PR also changes the serialization of JSON and Dynamic types to new version V2. Old version V1 can be still used by enabling setting
merge_tree_use_v1_object_and_dynamic_serialization(can be used during upgrade to be able to rollback the version without issues). #70442 (Pavel Kruglov).
- Implement simple CAST from Map/Tuple/Object to new JSON through serialization/deserialization from JSON string. #71320 (Pavel Kruglov).
- Don't allow Variant/Dynamic types in ORDER BY/GROUP BY/PARTITION BY/PRIMARY KEY by default because it may lead to unexpected results. #69731 (Pavel Kruglov).
- Forbid Dynamic/Variant types in min/max functions to avoid confusion. #71761 (Pavel Kruglov).
New Feature
- Added SQL syntax to describe workload and resource management. https://clickhouse.com/docs/operations/workload-scheduling. #69187 (Sergei Trifonov).
- A new data type,
BFloat16, represents 16-bit floating point numbers with 8-bit exponent, sign, and 7-bit mantissa. This closes #44206. This closes #49937. #64712 (Alexey Milovidov).
- Add
CHECK GRANTquery to check whether the current user/role has been granted the specific privilege and whether the corresponding table/column exists in the memory. #68885 (Unalian).
- Add
iceberg[S3;HDFS;Azure]Cluster,
deltaLakeCluster,
hudiClustertable functions. #72045 (Mikhail Artemenko).
- Add ability to set user/password in http_handlers (for
dynamic_query_handler/
predefined_query_handler). #70725 (Azat Khuzhin).
- Add support for staleness clause in the ORDER BY WITH FILL operator. #71151 (Mikhail Artemenko).
- Allow each authentication method to have its own expiration date, remove from user entity. #70090 (Arthur Passos).
- Added new functions
parseDateTime64,
parseDateTime64OrNulland
parseDateTime64OrZero. Compared to the existing function
parseDateTime(and variants), they return a value of type
DateTime64instead of
DateTime. #71581 (kevinyhzou).
Performance Improvement
- Optimized memory usage for values of index granularity if granularity is constant for part. Added an ability to always select constant granularity for part (setting
use_const_adaptive_granularity), which helps to ensure that it is always optimized in memory. It helps in large workloads (trillions of rows in shared storage) to avoid constantly growing memory usage by metadata (values of index granularity) of data parts. #71786 (Anton Popov).
- Now we don't copy input blocks columns for
join_algorithm = 'parallel_hash'when distribute them between threads for parallel processing. #67782 (Nikita Taranov).
- Optimized
Replacingmerge algorithm for non-intersecting parts. #70977 (Anton Popov).
- Do not list detached parts from readonly and write-once disks for metrics and system.detached_parts. #71086 (Alexey Milovidov).
- Do not calculate heavy asynchronous metrics by default. The feature was introduced in #40332, but it isn't good to have a heavy background job that is needed for only a single customer. #71087 (Alexey Milovidov).
- For the
plain_rewritabledisks: Do not call the object storage API when listing directories, as this may be cost-inefficient. Instead, store the list of filenames in the memory. The trade-offs are increased initial load time and memory required to store filenames. #70823 (Julia Kartseva).
- Improve the performance and accuracy of
system.query_metric_logcollection interval by reducing the critical region. #71473 (Pablo Marcos).
- Read-in-order optimization via generating virtual rows, so less data would be read during merge sort especially useful when multiple parts exist. #62125 (Shichao Jin).
- Added server setting
async_load_system_databasethat allows the server to start with not fully loaded system database. This helps to start ClickHouse faster if there are many system tables. #69847 (Sergei Trifonov).
- Add
--threadsparameter to
clickhouse-compressor, which allows to compress data in parallel. #70860 (Alexey Milovidov).
- Added a setting
prewarm_mark_cachewhich enables loading of marks to mark cache on inserts, merges, fetches of parts and on startup of the table. #71053 (Anton Popov).
- Shrink to fit index_granularity array in memory to reduce memory footprint for MergeTree table engines family. #71595 (alesapin).
- Turn off filesystem cache setting
boundary_alignmentfor non-disk read, which improves performance of reading from standalone remote files with caching. #71827 (Kseniia Sumarokova).
- Queries like
SELECT * FROM table LIMIT ...used to load part indexes even though they were not used. #71866 (Alexander Gololobov).
- Enable
parallel_replicas_local_planby default. Building a full-fledged local plan on the query initiator improves parallel replicas performance with less resource consumption, provides opportunities to apply more query optimizations. #70171 (Igor Nikonov).
Improvement
- Allow using clickhouse with a file argument as
ch queries.sql. #71589 (Raúl Marín).
- The
Verticalformat (which is also activated when you end your query with
\G) gets the features of Pretty formats, such as: - highlighting thousand groups in numbers; - printing a readable number tip. #71630 (Alexey Milovidov).
- Push external user roles from query originator to other nodes in cluster. Helpful when only originator has access to the external authenticator (like LDAP). #70332 (Andrey Zvonov).
- Added aliases
anyRespectNulls,
firstValueRespectNulls, and
anyValueRespectNullsfor aggregation function
any. Also added aliases
anyLastRespectNullsand
lastValueRespectNullsfor aggregation function
anyLast. This allows using more natural camel-case-only syntax rather than mixed camel-case/underscore syntax, for example:
SELECT anyLastRespectNullsStateIfinstead of
anyLast_respect_nullsStateIf. #71403 (Peter Nguyen).
- Added the configuration
date_time_utcparameter, enabling JSON log formatting to support UTC date-time in RFC 3339/ISO8601 format. #71560 (Ali).
- Added a new header type for S3 endpoints for user authentication (
access_header). This allows to get some access header with the lowest priority, which will be overwritten with
access_key_idfrom any other source (for example, a table schema or a named collection). #71011 (MikhailBurdukov).
- Higher-order functions with constant arrays and constant captured arguments will return constants. #58400 (Alexey Milovidov).
- Query plan step names (
EXPLAIN PLAN json=1) and pipeline processor names (
EXPLAIN PIPELINE compact=0,graph=1) now have a unique id as a suffix. This allows to match processors profiler output and OpenTelemetry traces with explain output. #63518 (qhsong).
- Added option to check if the object exists after writing it to Azure Blob Storage, this is controlled by setting
check_objects_after_upload. #64847 (Smita Kulkarni).
- Use
Atomicdatabase by default in
clickhouse-local. Address items 1 and 5 from #50647. Closes #44817. #68024 (Alexey Milovidov).
- Exceptions break the HTTP protocol in order to alert the client about error. #68800 (Sema Checherinda).
- Report hosts running distributed DDL queries by creating replica_dir and mark replicas active in DDLWorker. #69658 (tuanpach).
- Wait only on active replicas for database ON CLUSTER queries if distributed_ddl_output_mode is set to be *_only_active. #69660 (tuanpach).
- Better error-handling and cancellation of
ON CLUSTERbackups and restores: - If a backup or restore fails on one host then it'll be cancelled on other hosts automatically - No weird errors must be produced because some hosts failed while other hosts continued their work - If a backup or restore is cancelled on one host then it'll be cancelled on other hosts automatically - Fix issues with
test_disallow_concurrency- now disabling of concurrency must work better - Backups and restores now are much more resistant to ZooKeeper disconnects. #70027 (Vitaly Baranov).
- Support
ALTER TABLE ... MODIFY/RESET SETTING ...for certain settings in storage S3Queue. #70811 (Kseniia Sumarokova).
- Added the ability to reload client certificates in the same way as the procedure for reloading server certificates. #70997 (Roman Antonov).
- Make the client history size configurable and increase its default size. #71014 (Jiří Kozlovský).
- Boolean types support for the parquet native reader. #71055 (Arthur Passos).
- Retry more errors when interacting with S3, such as "Malformed message". #71088 (Alexey Milovidov).
- Lower log level for some messages about S3. #71090 (Alexey Milovidov).
- Support writing HDFS files with spaces. #71105 (exmy).
- Added settings limiting the number of replicated tables, dictionaries and views. #71179 (Kirill).
- Use
AWS_CONTAINER_AUTHORIZATION_TOKEN_FILEinstead of
AWS_CONTAINER_AUTHORIZATION_TOKENif former is available. Fixes #71074. #71269 (Konstantin Bogdanov).
- Remove the metadata_version ZooKeeper node creation from ReplicatedMergeTree restarting thread. The only scenario where we need to create this node is when the user updated from a version earlier than 20.4 straight to one later than 24.10. ClickHouse does not support upgrades that span more than a year, so we should throw an exception and ask the user to update gradually, instead of creating the node. #71385 (Miсhael Stetsyuk).
- Add per host dashboards
Overview (host)and
Cloud overview (host)to advanced dashboard. #71422 (alesapin).
clickhouse-localuses implicit SELECT by default, which allows to use it as a calculator. Improve the syntax highlighting for the implicit SELECT mode. #71620 (Alexey Milovidov).
- The command line applications will highlight syntax even for multi-statements. #71622 (Alexey Milovidov).
- Command-line applications will return non-zero exit codes on errors. In previous versions, the
disksapplication returned zero on errors, and other applications returned zero for errors 256 (
PARTITION_ALREADY_EXISTS) and 512 (
SET_NON_GRANTED_ROLE). #71623 (Alexey Milovidov).
- When user/group is given as ID, the
clickhouse sufails. This patch fixes it to accept
UID:GIDas well. #71626 (Mikhail f. Shiryaev).
- Allow to disable memory buffer increase for filesystem cache via setting
filesystem_cache_prefer_bigger_buffer_size. #71640 (Kseniia Sumarokova).
- Add a separate setting
background_download_max_file_segment_sizefor background download max file segment size in filesystem cache. #71648 (Kseniia Sumarokova).
- Slightly better JSON type parsing: if current block for the JSON path contains values of several types, try to choose the best type by trying types in special best-effort order. #71785 (Pavel Kruglov).
- Previously reading from
system.asynchronous_metricswould wait for concurrent update to finish. This can take long time if system is under heavy load. With this change the previously collected values can always be read. #71798 (Alexander Gololobov).
- S3Queue and AzureQueue: Set
polling_max_timeout_msto 10 minutes,
polling_backoff_msto 30 seconds. #71817 (Kseniia Sumarokova).
- Update
HostResolverthree times in a
historyperiod. #71863 (Sema Checherinda).
- On the advanced dashboard HTML page added a dropdown selector for the dashboard from
system.dashboardstable. #72081 (Sergei Trifonov).
- Check if default database is present after authorization. Fixes #71097. #71140 (Konstantin Bogdanov).
Bug Fix (user-visible misbehavior in an official stable release)
- The parts deduplicated during
ATTACH PARTquery don't get stuck with the
attaching_prefix anymore. #65636 (Kirill).
- Fix for the bug when DateTime64 losing precision for the
INfunction. #67230 (Yarik Briukhovetskyi).
- Fix possible logical error when using functions with
IGNORE/RESPECT NULLSin
ORDER BY ... WITH FILL, close #57609. #68234 (Vladimir Cherkasov).
- Fixed rare logical errors in asynchronous inserts with format
Nativein case of reached memory limit. #68965 (Anton Popov).
- Fix COMMENT in CREATE TABLE for EPHEMERAL column. #70458 (Yakov Olkhovskiy).
- Fix logical error in JSONExtract with LowCardinality(Nullable). #70549 (Pavel Kruglov).
- Allow system drop replica zkpath when there is another replica with the same zk path. #70642 (MikhailBurdukov).
- Fix a crash and a leak in AggregateFunctionGroupArraySorted. #70820 (Michael Kolupaev).
- Add ability to override Content-Type by user headers in the URL engine. #70859 (Artem Iurin).
- Fix logical error in
StorageS3Queue"Cannot create a persistent node in /processed since it already exists". #70984 (Kseniia Sumarokova).
- Fixed named sessions not being closed and hanging on forever under certain circumstances. #70998 (Márcio Martins).
- Fix the bug that didn't consider _row_exists column in rebuild option of projection lightweight delete. #71089 (Shichao Jin).
- Fix
AT_* is out of rangeproblem when running on Oracle Linux UEK 6.10. #71109 (Örjan Fors).
- Fix wrong value in system.query_metric_log due to unexpected race condition. #71124 (Pablo Marcos).
- Fix mismatched aggreage function name of quantileExactWeightedInterpolated. The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/69619. cc @Algunenano. #71168 (李扬).
- Fix bad_weak_ptr exception with Dynamic in functions comparison. #71183 (Pavel Kruglov).
- Checks that read 7z file is on a local machine. #71184 (Daniil Ivanik).
- Fix ignoring format settings in Native format via HTTP and Async Inserts. #71193 (Pavel Kruglov).
- SELECT queries run with setting
use_query_cache = 1are no longer rejected if the name of a system table appears as a literal, e.g.
SELECT * FROM users WHERE name = 'system.metrics' SETTINGS use_query_cache = true;now works. #71254 (Robert Schulze).
- Fix bug of memory usage increase if enable_filesystem_cache=1, but disk in storage configuration did not have any cache configuration. #71261 (Kseniia Sumarokova).
- Fix possible error "Cannot read all data" erros during deserialization of LowCardinality dictionary from Dynamic column. #71299 (Pavel Kruglov).
- Fix incomplete cleanup of parallel output format in the client. #71304 (Raúl Marín).
- Added missing unescaping in named collections. Without fix clickhouse-server can't start. #71308 (MikhailBurdukov).
- Fix async inserts with empty blocks via native protocol. #71312 (Anton Popov).
- Fix inconsistent AST formatting when granting wrong wildcard grants #71309. #71332 (pufit).
- Add try/catch to data parts destructors to avoid std::terminate. #71364 (alesapin).
- Check suspicious and experimental types in JSON type hints. #71369 (Pavel Kruglov).
- Start memory worker thread on non-Linux OS too (fixes #71051). #71384 (Alexandre Snarskii).
- Fix error Invalid number of rows in Chunk with the Variant column. #71388 (Pavel Kruglov).
- Fix error column "attgenerated" does not exist for older PostgreSQL versions, fix #60651. #71396 (0xMihalich).
- To avoid spamming the server logs, failing authentication attempts are now logged at level
DEBUGinstead of
ERROR. #71405 (Robert Schulze).
- Fix crash in
mongodbtable function when passing wrong arguments (e.g.
NULL). #71426 (Vladimir Cherkasov).
- Fix crash with optimize_rewrite_array_exists_to_has. #71432 (Raúl Marín).
- Fixed the usage of setting
max_insert_delayed_streams_for_parallel_writein inserts. Previously it worked incorrectly which could lead to high memory usage in inserts which write data into several partitions. #71474 (Anton Popov).
- Fix possible error
Argument for function must be constant(old analyzer) in case when arrayJoin can apparently appear in
WHEREcondition. Regression after https://github.com/ClickHouse/ClickHouse/pull/65414. #71476 (Nikolai Kochetov).
- Prevent crash in SortCursor with 0 columns (old analyzer). #71494 (Raúl Marín).
- Fix Date32 out of range caused by uninitialized ORC data. For more details, refer to https://github.com/apache/incubator-gluten/issues/7823. #71500 (李扬).
- Fix counting column size in wide part for Dynamic and JSON types. #71526 (Pavel Kruglov).
- Analyzer fix when query inside materialized view uses IN with CTE. Closes #65598. #71538 (Maksim Kita).
- Avoid crash when using a UDF in a constraint. #71541 (Raúl Marín).
- Return 0 or default char instead of throwing an error in bitShift functions in case of out of bounds. #71580 (Pablo Marcos).
- Fix server crashes while using materialized view with certain engines. #71593 (Pervakov Grigorii).
- Array join with a nested data structure, which contains an alias to a constant array was leading to a null pointer dereference. This closes #71677. #71678 (Alexey Milovidov).
- Fix LOGICAL_ERROR when doing ALTER with empty tuple. This fixes #71647. #71679 (Amos Bird).
- Don't transform constant set in predicates over partition columns in case of NOT IN operator. #71695 (Eduard Karacharov).
- Fix docker init script fail log message for more clean understanding. #71734 (Андрей).
- Fix CAST from LowCardinality(Nullable) to Dynamic. Previously it could lead to error
Bad cast from type DB::ColumnVector<int> to DB::ColumnNullable. #71742 (Pavel Kruglov).
- Fix exception for toDayOfWeek on WHERE condition with primary key of DateTime64 type. #71849 (Yakov Olkhovskiy).
- Fixed filling of defaults after parsing into sparse columns. #71854 (Anton Popov).
- Fix GROUPING function error when input is ALIAS on distributed table, close #68602. #71855 (Vladimir Cherkasov).
- Fix possible crash when using
allow_experimental_join_condition, close #71693. #71857 (Vladimir Cherkasov).
- Fixed select statements that use
WITH TIESclause which might not return enough rows. #71886 (wxybear).
- Fix the TOO_LARGE_ARRAY_SIZE exception caused when a column of arrayWithConstant evaluation is mistaken to cross the array size limit. #71894 (Udi).
clickhouse-benchmarkreported wrong metrics for queries taking longer than one second. #71898 (Alexey Milovidov).
- Fix data race between the progress indicator and the progress table in clickhouse-client. This issue is visible when FROM INFILE is used. Intercept keystrokes during INSERT queries to toggle progress table display. #71901 (Julia Kartseva).
- Use auxiliary keepers for cluster autodiscovery. #71911 (Anton Ivashkin).
- Fix rows_processed column in system.s3/azure_queue_log broken in 24.6. Closes #69975. #71946 (Kseniia Sumarokova).
- Fixed case when
s3/
s3Clusterfunctions could return incomplete result or throw an exception. It involved using glob pattern in s3 uri (like
pattern/*) and an empty object should exist with the key
pattern/(such objects automatically created by S3 Console). Also default value for setting
s3_skip_empty_fileschanged from
falseto
trueby default. #71947 (Nikita Taranov).
- Fix a crash in clickhouse-client syntax highlighting. Closes #71864. #71949 (Nikolay Degterinsky).
- Fix
Illegal typeerror for
MergeTreetables with binary monotonic function in
ORDER BYwhen the first argument is constant. Fixes #71941. #71966 (Nikolai Kochetov).
- Allow only SELECT queries in EXPLAIN AST used inside subquery. Other types of queries lead to logical error: 'Bad cast from type DB::ASTCreateQuery to DB::ASTSelectWithUnionQuery' or
Inconsistent AST formatting. #71982 (Pavel Kruglov).
- When insert a record by
clickhouse-client, client will read column descriptions from server. but there was a bug that we wrote the descritions with a wrong order , it should be [statistics, ttl, settings]. #71991 (Han Fei).
- Fix formatting of
MOVE PARTITION ... TO TABLE ...alter commands when
format_alter_commands_with_parenthesesis enabled. #72080 (János Benjamin Antal).
- Fixes RIGHT / FULL joins in queries with parallel replicas. Now, RIGHT joins can be executed with parallel replicas (right table reading is distributed). FULL joins can't be parallelized among nodes, - executed locally. #71162 (Igor Nikonov).
- Fix the issue where ClickHouse in Docker containers printed "get_mempolicy: Operation not permitted" into stderr due to restricted syscalls. #70900 (filimonov).
- Fix the metadata_version record in ZooKeeper in restarting thread rather than in attach thread. #70297 (Miсhael Stetsyuk).
- This is a fix for "zero-copy" replication, which is unsupported and will be removed entirely. Don't delete a blob when there are nodes using it in ReplicatedMergeTree with zero-copy replication. #71186 (Antonio Andelic).
- This is a fix for "zero-copy" replication, which is unsupported and will be removed entirely. Acquiring zero-copy shared lock before moving a part to zero-copy disk to prevent possible data loss if Keeper is unavailable. #71845 (Aleksei Filatov).
ClickHouse release 24.10, 2024-10-31
Backward Incompatible Change
- Allow to write
SETTINGSbefore
FORMATin a chain of queries with
UNIONwhen subqueries are inside parentheses. This closes #39712. Change the behavior when a query has the SETTINGS clause specified twice in a sequence. The closest SETTINGS clause will have a preference for the corresponding subquery. In the previous versions, the outermost SETTINGS clause could take a preference over the inner one. #68614 (Alexey Milovidov).
- Reordering of filter conditions from
[PRE]WHEREclause is now allowed by default. It could be disabled by setting
allow_reorder_prewhere_conditionsto
false. #70657 (Nikita Taranov).
- Remove the
idxd-configlibrary, which has an incompatible license. This also removes the experimental Intel DeflateQPL codec. #70987 (Alexey Milovidov).
New Feature
- Allow to grant access to the wildcard prefixes.
GRANT SELECT ON db.table_pefix_* TO user. #65311 (pufit).
- If you press space bar during query runtime, the client will display a real-time table with detailed metrics. You can enable it globally with the new
--progress-tableoption in clickhouse-client; a new
--enable-progress-table-toggleis associated with the
--progress-tableoption, and toggles the rendering of the progress table by pressing the control key (Space). #63689 (Maria Khristenko), #70423 (Julia Kartseva).
- Allow to cache read files for object storage table engines and data lakes using hash from ETag + file path as cache key. #70135 (Kseniia Sumarokova).
- Support creating a table with a query:
CREATE TABLE ... CLONE AS .... It clones the source table's schema and then attaches all partitions to the newly created table. This feature is only supported with tables of the
MergeTreefamily Closes #65015. #69091 (tuanpach).
- Add a new system table,
system.query_metric_logwhich contains history of memory and metric values from table system.events for individual queries, periodically flushed to disk. #66532 (Pablo Marcos).
- A simple SELECT query can be written with implicit SELECT to enable calculator-style expressions, e.g.,
ch "1 + 2". This is controlled by a new setting,
implicit_select. #68502 (Alexey Milovidov).
- Support the
--copymode for clickhouse local as a shortcut for format conversion #68503. #68583 (Denis Hananein).
- Add a builtin HTML page for visualizing merges which is available at the
/mergespath. #70821 (Alexey Milovidov).
- Add support for
arrayUnionfunction. #68989 (Peter Nguyen).
- Allow parametrised SQL aliases. #50665 (Anton Kozlov).
- A new aggregate function
quantileExactWeightedInterpolated, which is a interpolated version based on quantileExactWeighted. Some people may wonder why we need a new
quantileExactWeightedInterpolatedsince we already have
quantileExactInterpolatedWeighted. The reason is the new one is more accurate than the old one. This is for spark compatibility. #69619 (李扬).
- A new function
arrayElementOrNull. It returns
NULLif the array index is out of range or a Map key not found. #69646 (李扬).
- Allows users to specify regular expressions through new
message_regexpand
message_regexp_negativefields in the
config.xmlfile to filter out logging. The logging is applied to the formatted un-colored text for the most intuitive developer experience. #69657 (Peter Nguyen).
- Added
RIPEMD160function, which computes the RIPEMD-160 cryptographic hash of a string. Example:
SELECT HEX(RIPEMD160('The quick brown fox jumps over the lazy dog'))returns
37F332F68DB77BD9D7EDD4969571AD671CF9DD3B. #70087 (Dergousov Maxim).
- Support reading
Icebergtables on
HDFS. #70268 (flynn).
- Support for CTE in the form of
WITH ... INSERT, as previously we only supported
INSERT ... WITH .... #70593 (Shichao Jin).
- MongoDB integration: support for all MongoDB types, support for WHERE and ORDER BY statements on MongoDB side, restriction for expressions unsupported by MongoDB. Note that the new inegration is disabled by default, to use it, please set
<use_legacy_mongodb_integration>to
falsein server config. #63279 (Kirill Nikiforov).
- A new function
getSettingOrDefaultadded to return the default value and avoid exception if a custom setting is not found in the current profile. #69917 (Shankar).
Experimental feature
- Refreshable materialized views are production ready. #70550 (Michael Kolupaev). Refreshable materialized views are now supported in Replicated databases. #60669 (Michael Kolupaev).
- Parallel replicas are moved from experimental to beta. Reworked settings that control the behavior of parallel replicas algorithms. A quick recap: ClickHouse has four different algorithms for parallel reading involving multiple replicas, which is reflected in the setting
parallel_replicas_mode, the default value for it is
read_tasksAdditionally, the toggle-switch setting
enable_parallel_replicashas been added. #63151 (Alexey Milovidov), (Nikita Mikhaylov).
- Support for the
Dynamictype in most functions by executing them on internal types inside
Dynamic. #69691 (Pavel Kruglov).
- Allow to read/write the
JSONtype as a binary string in
RowBinaryformat under settings
input_format_binary_read_json_as_string/output_format_binary_write_json_as_string. #70288 (Pavel Kruglov).
- Allow to serialize/deserialize
JSONcolumn as single String column in the Native format. For output use setting
output_format_native_write_json_as_string. For input, use serialization version
1before the column data. #70312 (Pavel Kruglov).
- Introduced a special (experimental) mode of a merge selector for MergeTree tables which makes it more aggressive for the partitions that are close to the limit by the number of parts. It is controlled by the
merge_selector_use_blurry_baseMergeTree-level setting. #70645 (Nikita Mikhaylov).
- Implement generic ser/de between Avro's
Unionand ClickHouse's
Varianttypes. Resolves #69713. #69712 (Jiří Kozlovský).
Performance Improvement
- Refactor
IDiskand
IObjectStoragefor better performance. Tables from
plainand
plain_rewritableobject storages will initialize faster. #68146 (Alexey Milovidov, Julia Kartseva). Do not call the LIST object storage API when determining if a file or directory exists on the plain rewritable disk, as it can be cost-inefficient. #70852 (Julia Kartseva). Reduce the number of object storage HEAD API requests in the plain_rewritable disk. #70915 (Julia Kartseva).
- Added an ability to parse data directly into sparse columns. #69828 (Anton Popov).
- Improved performance of parsing formats with high number of missed values (e.g.
JSONEachRow). #69875 (Anton Popov).
- Supports parallel reading of parquet row groups and prefetching of row groups in single-threaded mode. #69862 (LiuNeng).
- Support minmax index for
pointInPolygon. #62085 (JackyWoo).
- Use bloom filters when reading Parquet files. #62966 (Arthur Passos).
- Lock-free parts rename to avoid INSERT affect SELECT (due to parts lock) (under normal circumstances with
fsync_part_directory, QPS of SELECT with INSERT in parallel, increased 2x, under heavy load the effect is even bigger). Note, this only includes
ReplicatedMergeTreefor now. #64955 (Azat Khuzhin).
- Respect
ttl_only_drop_partson
materialize ttl; only read necessary columns to recalculate TTL and drop parts by replacing them with an empty one. #65488 (Andrey Zvonov).
- Optimized thread creation in the ThreadPool to minimize lock contention. Thread creation is now performed outside of the critical section to avoid delays in job scheduling and thread management under high load conditions. This leads to a much more responsive ClickHouse under heavy concurrent load. #68694 (filimonov).
- Enable reading
LowCardinalitystring columns from
ORC. #69481 (李扬).
- Use
LowCardinalityfor
ProfileEventsin system logs such as
part_log,
query_views_log,
filesystem_cache_log. #70152 (Alexey Milovidov).
- Improve performance of
fromUnixTimestamp/
toUnixTimestampfunctions. #71042 (kevinyhzou).
- Don't disable nonblocking read from page cache for the entire server when reading from a blocking I/O. This was leading to a poorer performance when a single filesystem (e.g., tmpfs) didn't support the
preadv2syscall while others do. #70299 (Antonio Andelic).
ALTER TABLE .. REPLACE PARTITIONdoesn't wait anymore for mutations/merges that happen in other partitions. #59138 (Vasily Nemkov).
- Don't do validation when synchronizing ACL from Keeper. It's validating during creation. It shouldn't matter that much, but there are installations with tens of thousands or even more user created, and the unnecessary hash validation can take a long time to finish during server startup (it synchronizes everything from keeper). #70644 (Raúl Marín).
Improvement
CREATE TABLE ASwill copy
PRIMARY KEY,
ORDER BY, and similar clauses (of
MergeTreetables). #69739 (sakulali).
- Support 64-bit XID in Keeper. It can be enabled with the
use_xid_64configuration value. #69908 (Antonio Andelic).
- Command-line arguments for Bool settings are set to true when no value is provided for the argument (e.g.
clickhouse-client --optimize_aggregation_in_order --query "SELECT 1"). #70459 (davidtsuk).
- Added user-level settings
min_free_disk_bytes_to_perform_insertand
min_free_disk_perform_to_throw_insertto prevent insertions on disks that are almost full. #69755 (Marco Vilas Boas).
- Embedded documentation for settings will be strictly more detailed and complete than the documentation on the website. This is the first step before making the website documentation always auto-generated from the source code. This has long-standing implications: - it will be guaranteed to have every setting; - there is no chance of having default values obsolete; - we can generate this documentation for each ClickHouse version; - the documentation can be displayed by the server itself even without Internet access. Generate the docs on the website from the source code. #70289 (Alexey Milovidov).
- Allow empty needle in the function
replace, the same behavior with PostgreSQL. #69918 (zhanglistar).
- Allow empty needle in functions
replaceRegexp*. #70053 (zhanglistar).
- Symbolic links for tables in the
data/database_name/directory are created for the actual paths to the table's data, depending on the storage policy, instead of the
store/...directory on the default disk. #61777 (Kirill).
- While parsing an
Enumfield from
JSON, a string containing an integer will be interpreted as the corresponding
Enumelement. This closes #65119. #66801 (scanhex12).
- Allow
TRIM-ing
LEADINGor
TRAILINGempty string as a no-op. Closes #67792. #68455 (Peter Nguyen).
- Improve compatibility of
cast(timestamp as String)with Spark. #69179 (Wenzheng Liu).
- Always use the new analyzer to calculate constant expressions when
enable_analyzeris set to
true. Support calculation of
executabletable function arguments without using
SELECTquery for constant expressions. #69292 (Dmitry Novik).
- Add a setting
enable_secure_identifiersto disallow identifiers with special characters. #69411 (tuanpach).
- Add
show_create_query_identifier_quoting_ruleto define identifier quoting behavior in the
SHOW CREATE TABLEquery result. Possible values: -
user_display: When the identifiers is a keyword. -
when_necessary: When the identifiers is one of
{"distinct", "all", "table"}and when it could lead to ambiguity: column names, dictionary attribute names. -
always: Always quote identifiers. #69448 (tuanpach).
- Improve restoring of access entities' dependencies #69563 (Vitaly Baranov).
- If you run
clickhouse-clientor other CLI application, and it starts up slowly due to an overloaded server, and you start typing your query, such as
SELECT, the previous versions will display the remaining of the terminal echo contents before printing the greetings message, such as
SELECTClickHouse local version 24.10.1.1.instead of
ClickHouse local version 24.10.1.1.. Now it is fixed. This closes #31696. #69856 (Alexey Milovidov).
- Add new column
readonly_durationto the
system.replicastable. Needed to be able to distinguish actual readonly replicas from sentinel ones in alerts. #69871 (Miсhael Stetsyuk).
- Change the type of
join_output_by_rowlist_perkey_rows_thresholdsetting type to unsigned integer. #69886 (kevinyhzou).
- Enhance OpenTelemetry span logging to include query settings. #70011 (sharathks118).
- Add diagnostic info about higher-order array functions if lambda result type is unexpected. #70093 (ttanay).
- Keeper improvement: less locking during cluster changes. #70275 (Antonio Andelic).
- Add
WITH IMPLICITand
FINALkeywords to the
SHOW GRANTScommand. Fix a minor bug with implicit grants: #70094. #70293 (pufit).
- Respect
compatibilityfor MergeTree settings. The
compatibilityvalue is taken from the
defaultprofile on server startup, and default MergeTree settings are changed accordingly. Further changes of the
compatibilitysetting do not affect MergeTree settings. #70322 (Nikolai Kochetov).
- Avoid spamming the logs with large HTTP response bodies in case of errors during inter-server communication. #70487 (Vladimir Cherkasov).
- Added a new setting
max_parts_to_moveto control the maximum number of parts that can be moved at once. #70520 (Vladimir Cherkasov).
- Limit the frequency of certain log messages. #70601 (Alexey Milovidov).
CHECK TABLEwith
PARTqualifier was incorrectly formatted in the client. #70660 (Alexey Milovidov).
- Support writing the column index and the offset index using parquet native writer. #70669 (LiuNeng).
- Support parsing
DateTime64for microsecond and timezone in joda syntax ("joda" is a popular Java library for date and time, and the "joda syntax" is that library's style). #70737 (kevinyhzou).
- Changed an approach to figure out if a cloud storage supports batch delete or not. #70786 (Vitaly Baranov).
- Support for Parquet page v2 in the native reader. #70807 (Arthur Passos).
- A check if table has both
storage_policyand
diskset. A check if a new storage policy is compatible with an old one when using
disksetting is added. #70839 (Kirill).
- Add
system.s3_queue_settingsand
system.azure_queue_settings. #70841 (Kseniia Sumarokova).
- Functions
base58Encodeand
base58Decodenow accept arguments of type
FixedString. Example:
SELECT base58Encode(toFixedString('plaintext', 9));. #70846 (Faizan Patel).
- Add the
partitioncolumn to every entry type of the part log. Previously, it was set only for some entries. This closes #70819. #70848 (Alexey Milovidov).
- Add
MergeStartand
MutateStartevents into
system.part_logwhich helps with merges analysis and visualization. #70850 (Alexey Milovidov).
- Add a profile event about the number of merged source parts. It allows the monitoring of the fanout of the merge tree in production. #70908 (Alexey Milovidov).
- Background downloads to the filesystem cache were enabled back. #70929 (Nikita Taranov).
- Add a new merge selector algorithm, named
Trivial, for professional usage only. It is worse than the
Simplemerge selector. #70969 (Alexey Milovidov).
- Support for atomic
CREATE OR REPLACE VIEW. #70536 (tuanpach)
- Added
strict_oncemode to aggregate function
windowFunnelto avoid counting one event several times in case it matches multiple conditions, close #21835. #69738 (Vladimir Cherkasov).
Bug Fix (user-visible misbehavior in an official stable release)
- Apply configuration updates in global context object. It fixes issues like #62308. #62944 (Amos Bird).
- Fix
ReadSettingsnot using user set values, because defaults were only used. #65625 (Kseniia Sumarokova).
- Fix type mismatch issue in
sumMapFilteredwhen using signed arguments. #58408 (Chen768959).
- Fix toHour-like conversion functions' monotonicity when optional time zone argument is passed. #60264 (Amos Bird).
- Relax
supportsPrewherecheck for
Mergetables. This fixes #61064. It was hardened unnecessarily in #60082. #61091 (Amos Bird).
- Fix
use_concurrency_controlsetting handling for proper
concurrent_threads_soft_limit_numlimit enforcing. This enables concurrency control by default because previously it was broken. #61473 (Sergei Trifonov).
- Fix incorrect
JOIN ONsection optimization in case of
IS NULLcheck under any other function (like
NOT) that may lead to wrong results. Closes #67915. #68049 (Vladimir Cherkasov).
- Prevent
ALTERqueries that would make the
CREATEquery of tables invalid. #68574 (János Benjamin Antal).
- Fix inconsistent AST formatting for
negate(
-) and
NOTfunctions with tuples and arrays. #68600 (Vladimir Cherkasov).
- Fix insertion of incomplete type into
Dynamicduring deserialization. It could lead to
Parameter out of bounderrors. #69291 (Pavel Kruglov).
- Zero-copy replication, which is experimental and should not be used in production: fix inf loop after
restore replicain the replicated merge tree with zero copy. #69293 (MikhailBurdukov).
- Return back default value of
processing_threads_numas number of cpu cores in storage
S3Queue. #69384 (Kseniia Sumarokova).
- Bypass try/catch flow when de/serializing nested repeated protobuf to nested columns (fixes #41971). #69556 (Eliot Hautefeuille).
- Fix crash during insertion into FixedString column in PostgreSQL engine. #69584 (Pavel Kruglov).
- Fix crash when executing
create view t as (with recursive 42 as ttt select ttt);. #69676 (Han Fei).
- Fixed
maxMapStatethrowing 'Bad get' if value type is DateTime64. #69787 (Michael Kolupaev).
- Fix
getSubcolumnwith
LowCardinalitycolumns by overriding
useDefaultImplementationForLowCardinalityColumnsto return
true. #69831 (Miсhael Stetsyuk).
- Fix permanent blocked distributed sends if a DROP of distributed table failed. #69843 (Azat Khuzhin).
- Fix non-cancellable queries containing WITH FILL with NaN keys. This closes #69261. #69845 (Alexey Milovidov).
- Fix analyzer default with old compatibility value. #69895 (Raúl Marín).
- Don't check dependencies during CREATE OR REPLACE VIEW during DROP of old table. Previously CREATE OR REPLACE query failed when there are dependent tables of the recreated view. #69907 (Pavel Kruglov).
- Something for Decimal. Fixes #69730. #69978 (Arthur Passos).
- Now DEFINER/INVOKER will work with parameterized views. #69984 (pufit).
- Fix parsing for view's definers. #69985 (pufit).
- Fixed a bug when the timezone could change the result of the query with a
Dateor
Date32arguments. #70036 (Yarik Briukhovetskyi).
- Fixes
Block structure mismatchfor queries with nested views and
WHEREcondition. Fixes #66209. #70054 (Nikolai Kochetov).
- Avoid reusing columns among different named tuples when evaluating
tuplefunctions. This fixes #70022. #70103 (Amos Bird).
- Fix wrong LOGICAL_ERROR when replacing literals in ranges. #70122 (Pablo Marcos).
- Check for Nullable(Nothing) type during ALTER TABLE MODIFY COLUMN/QUERY to prevent tables with such data type. #70123 (Pavel Kruglov).
- Proper error message for illegal query
JOIN ... ON *, close #68650. #70124 (Vladimir Cherkasov).
- Fix wrong result with skipping index. #70127 (Raúl Marín).
- Fix data race in ColumnObject/ColumnTuple decompress method that could lead to heap use after free. #70137 (Pavel Kruglov).
- Fix possible hung in ALTER COLUMN with Dynamic type. #70144 (Pavel Kruglov).
- Now ClickHouse will consider more errors as retriable and will not mark data parts as broken in case of such errors. #70145 (alesapin).
- Use correct
max_typesparameter during Dynamic type creation for JSON subcolumn. #70147 (Pavel Kruglov).
- Fix the password being displayed in
system.query_logfor users with bcrypt password authentication method. #70148 (Nikolay Degterinsky).
- Fix event counter for the native interface (InterfaceNativeSendBytes). #70153 (Yakov Olkhovskiy).
- Fix possible crash related to JSON columns. #70172 (Pavel Kruglov).
- Fix multiple issues with arrayMin and arrayMax. #70207 (Raúl Marín).
- Respect setting allow_simdjson in the JSON type parser. #70218 (Pavel Kruglov).
- Fix a null pointer dereference on creating a materialized view with two selects and an
INTERSECT, e.g.
CREATE MATERIALIZED VIEW v0 AS (SELECT 1) INTERSECT (SELECT 1);. #70264 (Konstantin Bogdanov).
- Don't modify global settings with startup scripts. Previously, changing a setting in a startup script would change it globally. #70310 (Antonio Andelic).
- Fix ALTER of
Dynamictype with reducing max_types parameter that could lead to server crash. #70328 (Pavel Kruglov).
- Fix crash when using WITH FILL incorrectly. #70338 (Raúl Marín).
- Fix possible use-after-free in
SYSTEM DROP FORMAT SCHEMA CACHE FOR Protobuf. #70358 (Azat Khuzhin).
- Fix crash during GROUP BY JSON sub-object subcolumn. #70374 (Pavel Kruglov).
- Don't prefetch parts for vertical merges if part has no rows. #70452 (Antonio Andelic).
- Fix crash in WHERE with lambda functions. #70464 (Raúl Marín).
- Fix table creation with
CREATE ... AS table_function(...)with database
Replicatedand unavailable table function source on secondary replica. #70511 (Kseniia Sumarokova).
- Ignore all output on async insert with
wait_for_async_insert=1. Closes #62644. #70530 (Konstantin Bogdanov).
- Ignore frozen_metadata.txt while traversing shadow directory from system.remote_data_paths. #70590 (Aleksei Filatov).
- Fix creation of stateful window functions on misaligned memory. #70631 (Raúl Marín).
- Fixed rare crashes in
SELECT-s and merges after adding a column of
Arraytype with non-empty default expression. #70695 (Anton Popov).
- Insert into table function s3 will respect query settings. #70696 (Vladimir Cherkasov).
- Fix infinite recursion when inferring a protobuf schema when skipping unsupported fields is enabled. #70697 (Raúl Marín).
- Disable enable_named_columns_in_function_tuple by default. #70833 (Raúl Marín).
- Fix S3Queue table engine setting processing_threads_num not being effective in case it was deduced from the number of cpu cores on the server. #70837 (Kseniia Sumarokova).
- Normalize named tuple arguments in aggregation states. This fixes #69732 . #70853 (Amos Bird).
- Fix a logical error due to negative zeros in the two-level hash table. This closes #70973. #70979 (Alexey Milovidov).
- Fix
limit by,
limit with tiesfor distributed and parallel replicas. #70880 (Nikita Taranov).
ClickHouse release 24.9, 2024-09-26
Backward Incompatible Change
- Expressions like
a[b].care supported for named tuples, as well as named subscripts from arbitrary expressions, e.g.,
expr().name. This is useful for processing JSON. This closes #54965. In previous versions, an expression of form
expr().namewas parsed as
tupleElement(expr(), name), and the query analyzer was searching for a column
namerather than for the corresponding tuple element; while in the new version, it is changed to
tupleElement(expr(), 'name'). In most cases, the previous version was not working, but it is possible to imagine a very unusual scenario when this change could lead to incompatibility: if you stored names of tuple elements in a column or an alias, that was named differently than the tuple element's name:
SELECT 'b' AS a, CAST([tuple(123)] AS 'Array(Tuple(b UInt8))') AS t, t[1].a. It is very unlikely that you used such queries, but we still have to mark this change as potentially backward incompatible. #68435 (Alexey Milovidov).
- When the setting
print_pretty_type_namesis enabled, it will print
Tupledata type in a pretty form in
SHOW CREATE TABLEstatements,
formatQueryfunction, and in the interactive mode in
clickhouse-clientand
clickhouse-local. In previous versions, this setting was only applied to
DESCRIBEqueries and
toTypeName. This closes #65753. #68492 (Alexey Milovidov).
- Do not allow explicitly specifying UUID when creating a table in
Replicateddatabases. Also, do not allow explicitly specifying Keeper path and replica name for *MergeTree tables in Replicated databases. It introduces a new setting
database_replicated_allow_explicit_uuidand changes the type of
database_replicated_allow_replicated_engine_argumentsfrom Bool to UInt64 #66104 (Alexander Tokmakov).
New Feature
- Allow a user to have multiple authentication methods instead of only one. Allow authentication methods to be reset to most recently added method. If you want to run instances on 24.8 and one on 24.9 for some time, it's better to set
max_authentication_methods_per_user= 1 for that period to avoid potential incompatibilities. #65277 (Arthur Passos).
- Add support for
ATTACH PARTITION ALL FROM. #61987 (Kirill Nikiforov).
- Add the
input_format_json_empty_as_defaultsetting which, when enabled, treats empty fields in JSON inputs as default values. Closes #59339. #66782 (Alexis Arnaud).
- Added functions
overlayand
overlayUTF8which replace parts of a string by another string. Example:
SELECT overlay('Hello New York', 'Jersey', 11)returns
Hello New Jersey. #66933 (李扬).
- Add support for lightweight deletes in partition
DELETE FROM [db.]table [ON CLUSTER cluster] [IN PARTITION partition_expr] WHERE expr;#67805 (sunny).
- Implemented comparison for
Intervaldata type values of different domains (such as seconds and minutes) so they are converting now to the least supertype. #68057 (Yarik Briukhovetskyi).
- Add
create_if_not_existssetting to default to
IF NOT EXISTSbehavior during CREATE statements. #68164 (Peter Nguyen).
- Makes it possible to read
Icebergtables in Azure and locally. #68210 (Daniil Ivanik).
- Query cache entries can now be dropped by tag. For example, the query cache entry created by
SELECT 1 SETTINGS use_query_cache = true, query_cache_tag = 'abc'can now be dropped by
SYSTEM DROP QUERY CACHE TAG 'abc'. #68477 (Michał Tabaszewski).
- Add storage encryption for named collections. #68615 (Pablo Marcos).
- Add virtual column
_headersfor the
URLtable engine. Closes #65026. #68867 (flynn).
- Add
system.projectionstable to track available projections. #68901 (Jordi Villar).
- Add new function
arrayZipUnalignedfor spark compatibility (which is named
arrays_zipin Spark), which allowed unaligned arrays based on original
arrayZip. #69030 (李扬).
- Added
cp/
mvcommands for the keeper client command line application which atomically copies/moves node. #69034 (Mikhail Artemenko).
- Adds argument
scale(default:
true) to function
arrayAUCwhich allows to skip the normalization step (issue #69609). #69717 (gabrielmcg44).
Experimental feature
- Adds a setting
input_format_try_infer_variantswhich allows
Varianttype to be inferred during schema inference for text formats when there is more than one possible type for column/array elements. #63798 (Shaun Struwig).
- Add aggregate functions
distinctDynamicTypes/
distinctJSONPaths/
distinctJSONPathsAndTypesfor better introspection of JSON column type content. #68463 (Kruglov Pavel).
- New algorithm to determine the unit of marks distribution between parallel replicas by a consistent hash. Different numbers of marks chosen for different read patterns to improve performance. #68424 (Nikita Taranov).
- Previously the algorithmic complexity of part deduplication logic in parallel replica announcement handling was O(n^2) which could take noticeable time for tables with many part (or partitions). This change makes the complexity O(n*log(n)). #69596 (Alexander Gololobov).
- Refreshable materialized view improvements: append mode (
... REFRESH EVERY 1 MINUTE APPEND ...) to add rows to existing table instead of overwriting the whole table, retries (disabled by default, configured in SETTINGS section of the query),
SYSTEM WAIT VIEW <name>query that waits for the currently running refresh, some fixes. #58934 (Michael Kolupaev).
- Added
min_maxas a new type of (experimental) statistics. It supports estimating range predicates over numeric columns, e.g.
x < 100. #67013 (JackyWoo).
- Improve castOrDefault from Variant/Dynamic columns so it works when inner types are not convertible at all. #67150 (Kruglov Pavel).
- Replication of subset of columns is now available through MaterializedPostgreSQL. Closes #33748. #69092 (Kruglov Kirill).
Performance Improvement
- Implemented reading of required files only for Hive partitioning. #68963 (Yarik Briukhovetskyi).
- Improve the JOIN performance by rearranging the right table by keys while the table keys are dense in the LEFT or INNER hash joins. #60341 (kevinyhzou).
- Improve ALL JOIN performance by appending the list of rows lazily. #63677 (kevinyhzou).
- Load filesystem cache metadata asynchronously during the boot process, in order to make restarts faster (controlled by setting
load_metadata_asynchronously). #65736 (Daniel Pozo Escalona).
- Functions
arrayand
mapwere optimized to process certain common cases much faster. #67707 (李扬).
- Trivial optimize on ORC strings reading especially when a column contains no NULLs. #67794 (李扬).
- Improved overall performance of merges by reducing the overhead of scheduling steps of merges. #68016 (Anton Popov).
- Speed up requests to S3 when a profile is not set, credentials are not set, and IMDS is not available (for example, when you are querying a public bucket on a machine outside of a cloud). This closes #52771. #68082 (Alexey Milovidov).
- Devirtualize format reader in
RowInputFormatWithNamesAndTypesfor some performance improvement. #68437 (李扬).
- Add the parallel merge for
uniqaggregate function when aggregating with a group by key to maximize the CPU utilization. #68441 (Jiebin Sun).
- Add settings
output_format_orc_dictionary_key_size_thresholdto allow user to enable dict encoding for string column in
ORCoutput format. It helps reduce the output
ORCfile size and improve reading performance significantly. #68591 (李扬).
- Introduce new Keeper request RemoveRecursive which removes node with all it's subtree. #69332 (Mikhail Artemenko).
- Speedup insertion performance into a table with a vector similarity index by adding data to the vector index in parallel. #69493 (flynn).
- Reduce memory usage of inserts to JSON by using adaptive write buffer size. A lot of files created by JSON column in wide part contains small amount of data and it doesn't make sense to allocate 1MB buffer for them. #69272 (Kruglov Pavel).
- Avoid returning a thread in the concurrent hash join threadpool to avoid query excessively spawn threads. #69406 (Duc Canh Le).
Improvement
- CREATE TABLE AS now copies PRIMARY KEY, ORDER BY, and similar clauses. Now it supports only for MergeTree family of table engines. #69076 (sakulali).
- Hardened parts of the codebase related to parsing of small entities. The following (minor) bugs were found and fixed: - if a
DeltaLaketable is partitioned by Bool, the partition value is always interpreted as false; -
ExternalDistributedtable was using only a single shard in the provided addresses; the value of
max_threadssetting and similar were printed as
'auto(N)'instead of
auto(N). #52503 (Alexey Milovidov).
- Use cgroup-specific metrics for CPU usage accounting instead of system-wide metrics. #62003 (Nikita Taranov).
- IO scheduling for remote S3 disks is now done on the level of HTTP socket streams (instead of the whole S3 requests) to resolve
bandwidth_limitthrottling issues. #65182 (Sergei Trifonov).
- Functions
upperUTF8and
lowerUTF8were previously only able to uppercase / lowercase Cyrillic characters. This limitation is now removed and characters in arbitrary languages are uppercased/lowercased. Example:
SELECT upperUTF8('Süden')now returns
SÜDEN. #65761 (李扬).
- When lightweight delete happens on a table with projection(s), despite users have choices either throw an exception (by default) or drop the projection when the lightweight delete would happen, now there is the third option to still have lightweight delete and then rebuild projection(s). #66169 (jsc0218).
- Two options (
dns_allow_resolve_names_to_ipv4and
dns_allow_resolve_names_to_ipv6) have been added, to allow block connections ip family. #66895 (MikhailBurdukov).
- Make Ctrl-Z ignorance configurable (ignore_shell_suspend) in clickhouse-client. #67134 (Azat Khuzhin).
- Improve UTF-8 validation in JSON output formats. Ensures that valid JSON is generated in the case of certain byte sequences in the result data. #67938 (mwoenker).
- Added profile events for merges and mutations for better introspection. #68015 (Anton Popov).
- ODBC: get http_max_tries from the server configuration. #68128 (Rodolphe Dugé de Bernonville).
- Add wildcard support for user identification in X.509 SubjectAltName extension. #68236 (Marco Vilas Boas).
- Improve schema inference of date times. Now
DateTime64used only when date time has fractional part, otherwise regular DateTime is used. Inference of Date/DateTime is more strict now, especially when
date_time_input_format='best_effort'to avoid inferring date times from strings in corner cases. #68382 (Kruglov Pavel).
- Delete old code of named collections from dictionaries and substitute it to the new, which allows to use DDL created named collections in dictionaries. Closes #60936, closes #36890. #68412 (Kseniia Sumarokova).
- Use HTTP/1.1 instead of HTTP/1.0 (set by default) for external HTTP authenticators. #68456 (Aleksei Filatov).
- Added a new set of metrics for thread pool introspection, providing deeper insights into thread pool performance and behavior. #68674 (filimonov).
- Support query parameters in async inserts with format
Values. #68741 (Anton Popov).
- Support
Date32on
dateTruncand
toStartOfInterval. #68874 (LiuNeng).
- Add
plan_step_nameand
plan_step_descriptioncolumns to
system.processors_profile_log. #68954 (Alexander Gololobov).
- Support for the Spanish language in the embedded dictionaries. #69035 (Vasily Okunev).
- Add CPU arch to the short fault information message. #69037 (Konstantin Bogdanov).
- Queries will fail faster if a new Keeper connection cannot be established during retries. #69148 (Raúl Marín).
- Update Database Factory so it would be possible for user defined database engines to have arguments, settings and table overrides (similar to StorageFactory). #69201 (NikBarykin).
- Restore mode that replaces all external table engines and functions to the
Nullengine (
restore_replace_external_engines_to_null,
restore_replace_external_table_functions_to_nullsettings) was failing if table had SETTINGS. Now it removes settings from table definition in this case and allows to restore such tables. #69253 (Ilya Yatsishin).
- CLICKHOUSE_PASSWORD is correctly escaped for XML in clickhouse image's entrypoint. #69301 (aohoyd).
- Allow empty arguments for
arrayZip/
arrayZipUnaligned, as concat did in https://github.com/ClickHouse/ClickHouse/pull/65887. It is for spark compatibility in Gluten CH Backend. #69576 (李扬).
- Support more advanced SSL options for Keeper's internal communication (e.g. private keys with passphrase). #69582 (Antonio Andelic).
- Index analysis can take noticeable time for big tables with many parts or partitions. This change should enable killing a heavy query at that stage. #69606 (Alexander Gololobov).
- Masking sensitive info in
gcstable function. #69611 (Vitaly Baranov).
- Rebuild projection for merges that reduce number of rows. #62364 (cangyin).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix attaching table when pg dbname contains "-" in the experimental and unsupported MaterializedPostgreSQL engine. #62730 (takakawa).
- Fixed error on generated columns in the experimental and totally unsupported MaterializedPostgreSQL engine when adnum ordering is broken #63161. Fixed error on id column with nextval expression as default in the experimental and totally unsupported MaterializedPostgreSQL when there are generated columns in table. Fixed error on dropping publication with symbols except [a-z1-9-]. #67664 (Kruglov Kirill).
- Storage Join to support Nullable columns in the left table, close #61247. #66926 (vdimir).
- Incorrect query result with parallel replicas (distribute queries as well) when
INoperator contains conversion to Decimal(). The bug was introduced with the new analyzer. #67234 (Igor Nikonov).
- Fix the problem that alter modify order by causes inconsistent metadata. #67436 (iceFireser).
- Fix the upper bound of the function
fromModifiedJulianDay. It was supposed to be
9999-12-31but was mistakenly set to
9999-01-01. #67583 (PHO).
- Fix when the index is not at the beginning of the tuple during
INquery. #67626 (Yarik Briukhovetskyi).
- Fix expiration in
RoleCache. #67748 (Vitaly Baranov).
- Fix window view missing blocks due to slow flush to view. #67983 (Raúl Marín).
- Fix MSan issue caused by incorrect date format. #68105 (JackyWoo).
- Fixed crash in Parquet filtering when data types in the file substantially differ from requested types (e.g.
... FROM file('a.parquet', Parquet, 'x String'), but the file has
x Int64). Without this fix, use
input_format_parquet_filter_push_down = 0as a workaround. #68131 (Michael Kolupaev).
- Fix crash in
lag/
leadwhich is introduced in #67091. #68262 (lgbo).
- Try fix postgres crash when query is cancelled. #68288 (Kseniia Sumarokova).
- After https://github.com/ClickHouse/ClickHouse/pull/61984
schema_inference_make_columns_nullable=0still can make columns
Nullablein Parquet/Arrow formats. The change was backward incompatible and users noticed the changes in the behaviour. This PR makes
schema_inference_make_columns_nullable=0to work as before (no Nullable columns will be inferred) and introduces new value
autofor this setting that will make columns
Nullableonly if data has information about nullability. #68298 (Kruglov Pavel).
- Fixes #50868. Small DateTime64 constant values returned by a nested subquery inside a distributed query were wrongly transformed to Nulls, thus causing errors and possible incorrect query results. #68323 (Shankar).
- Fix missing sync replica mode in query
SYSTEM SYNC REPLICA. #68326 (Duc Canh Le).
- Fix bug in key condition. #68354 (Han Fei).
- Fix crash on drop or rename a role that is used in LDAP external user directory. #68355 (Andrey Zvonov).
- Fix Progress column value of system.view_refreshes greater than 1 #68377. #68378 (megao).
- Process regexp flags correctly. #68389 (Han Fei).
- PostgreSQL-style cast operator (
::) works correctly even for SQL-style hex and binary string literals (e.g.,
SELECT x'414243'::String). This closes #68324. #68482 (Alexey Milovidov).
- Minor patch for https://github.com/ClickHouse/ClickHouse/pull/68131. #68494 (Chang chen).
- Fix #68239 SAMPLE n where n is an integer. #68499 (Denis Hananein).
- Fix bug in mann-whitney-utest when the size of two districutions are not equal. #68556 (Han Fei).
- After unexpected restart, fail to start replication of ReplicatedMergeTree due to abnormal handling of covered-by-broken part. #68584 (baolin).
- Fix
LOGICAL_ERRORs when functions
sipHash64Keyed,
sipHash128Keyed, or
sipHash128ReferenceKeyedare applied to empty arrays or tuples. #68630 (Robert Schulze).
- Full text index may filter out wrong columns when index multiple columns, it didn't reset row_id between different columns, the reproduce procedure is in tests/queries/0_stateless/03228_full_text_with_multi_col.sql. Without this. #68644 (siyuan).
- Fix invalid character '\t' and '\n' in replica_name when creating a Replicated table, which causes incorrect parsing of 'source replica' in LogEntry. Mentioned in issue #68640. #68645 (Zhigao Hong).
- Added back virtual columns
_tableand
_databaseto distributed tables. They were available until version 24.3. #68672 (Anton Popov).
- Fix possible error
Size of permutation (0) is less than required (...)during Variant column permutation. #68681 (Kruglov Pavel).
- Fix possible error
DB::Exception: Block structure mismatch in joined block stream: different columns:with new JSON column. #68686 (Kruglov Pavel).
- Fix issue with materialized constant keys when hashing maps with arrays as keys in functions
sipHash(64/128)Keyed. #68731 (Salvatore Mesoraca).
- Make
ColumnsDescription::toStringformat each column using the same
IAST::FormatState object. This results in uniform columns metadata being written to disk and ZooKeeper. #68733 (Miсhael Stetsyuk).
- Fix merging of aggregated data for grouping sets. #68744 (Nikolai Kochetov).
- Fix logical error, when we create a replicated merge tree, alter a column and then execute modify statistics. #68820 (Han Fei).
- Fix resolving dynamic subcolumns from subqueries in analyzer. #68824 (Kruglov Pavel).
- Fix complex types metadata parsing in DeltaLake. Closes #68739. #68836 (Kseniia Sumarokova).
- Fixed asynchronous inserts in case when metadata of table is changed (by
ALTER ADD/MODIFY COLUMNqueries) after insert but before flush to the table. #68837 (Anton Popov).
- Fix unexpected exception when passing empty tuple in array. This fixes #68618. #68848 (Amos Bird).
- Fix parsing pure metadata mutations commands. #68935 (János Benjamin Antal).
- Fix possible wrong result during anyHeavy state merge. #68950 (Raúl Marín).
- Fixed writing to Materialized Views with enabled setting
optimize_functions_to_subcolumns. #68951 (Anton Popov).
- Don't use serializations cache in const Dynamic column methods. It could let to use-of-uninitialized value or even race condition during aggregations. #68953 (Kruglov Pavel).
- Fix parsing error when null should be inserted as default in some cases during JSON type parsing. #68955 (Kruglov Pavel).
- Fix
Content-Encodingnot sent in some compressed responses. #64802. #68975 (Konstantin Bogdanov).
- There were cases when path was concatenated incorrectly and had the
//part in it, solving this problem using path normalization. #69066 (Yarik Briukhovetskyi).
- Fix logical error when we have empty async insert. #69080 (Han Fei).
- Fixed data race of progress indication in clickhouse-client during query canceling. #69081 (Sergei Trifonov).
- Fix a bug that the vector similarity index (currently experimental) was not utilized when used with cosine distance as distance function. #69090 (flynn).
- This change addresses an issue where attempting to create a Replicated database again after a server failure during the initial creation process could result in error. #69102 (Miсhael Stetsyuk).
- Don't infer Bool type from String in CSV when
input_format_csv_try_infer_numbers_from_strings = 1because we don't allow reading bool values from strings. #69109 (Kruglov Pavel).
- Fix explain ast insert queries parsing errors on client when
--multiqueryis enabled. #69123 (wxybear).
UNIONclause in subqueries wasn't handled correctly in queries with parallel replicas and lead to LOGICAL_ERROR
Duplicate announcement received for replica. #69146 (Igor Nikonov).
- Fix propogating structure argument in s3Cluster. Previously the
DEFAULTexpression of the column could be lost when sending the query to the replicas in s3Cluster. #69147 (Kruglov Pavel).
- Respect format settings in Values format during conversion from expression to the destination type. #69149 (Kruglov Pavel).
- Fix
clickhouse-client --queries-filefor readonly users (previously fails with
Cannot modify 'log_comment' setting in readonly mode). #69175 (Azat Khuzhin).
- Fix data race in clickhouse-client when it's piped to a process that terminated early. #69186 (vdimir).
- Fix incorrect results of Fix uniq and GROUP BY for JSON/Dynamic types. #69203 (Kruglov Pavel).
- Fix the INFILE format detection for asynchronous inserts. If the format is not explicitly defined in the FORMAT clause, it can be detected from the INFILE file extension. #69237 (Julia Kartseva).
- After this issue there are quite a few table replicas in production such that their
metadata_versionnode value is both equal to
0and is different from the respective table's
metadatanode version. This leads to
alterqueries failing on such replicas. #69274 (Miсhael Stetsyuk).
- Mark Dynamic type as not safe primary key type to avoid issues with Fields. #69311 (Kruglov Pavel).
- Improve restoring of access entities' dependencies. #69346 (Vitaly Baranov).
- Fix undefined behavior when all connection attempts fail getting a connection for insertions. #69390 (Pablo Marcos).
- Close #69135. If we try to reuse joined data for
crossjoin, but this could not happen in ClickHouse at present. It's better to keep
have_compressedin
reuseJoinedData. #69404 (lgbo).
- Make
materialize()function return full column when parameter is a sparse column. #69429 (Alexander Gololobov).
- Fixed a
LOGICAL_ERRORwith function
sqidDecode(#69450). #69451 (Robert Schulze).
- Quick fix for s3queue problem on 24.6 or create query with database replicated. #69454 (Kseniia Sumarokova).
- Fixed case when memory consumption was too high because of the squashing in
INSERT INTO ... SELECTor
CREATE TABLE AS SELECTqueries. #69469 (Yarik Briukhovetskyi).
- Statements
SHOW COLUMNSand
SHOW INDEXnow work properly if the table has dots in its name. #69514 (Salvatore Mesoraca).
- Usage of the query cache for queries with an overflow mode != 'throw' is now disallowed. This prevents situations where potentially truncated and incorrect query results could be stored in the query cache. (issue #67476). #69549 (Robert Schulze).
- Keep original order of conditions during move to prewhere. Previously the order could change and it could lead to failing queries when the order is important. #69560 (Kruglov Pavel).
- Fix Keeper multi-request preprocessing after ZNOAUTH error. #69627 (Antonio Andelic).
- Fix METADATA_MISMATCH that might have happened due to TTL with a WHERE clause in DatabaseReplicated when creating a new replica. #69736 (Nikolay Degterinsky).
- Fix
StorageS3(Azure)Queuesettings
tracked_file_ttl_sec. We wrote it to keeper with key
tracked_file_ttl_sec, but read as
tracked_files_ttl_sec, which was a typo. #69742 (Kseniia Sumarokova).
- Use tryconvertfieldtotype in gethyperrectangleforrowgroup. #69745 (Miсhael Stetsyuk).
- Revert "Fix prewhere without columns and without adaptive index granularity (almost w/o anything)"'. Due to the reverted changes some errors might happen when reading data parts produced by old CH releases (presumably 2021 or older). #68897 (Alexander Gololobov).
ClickHouse release 24.8 LTS, 2024-08-20
Backward Incompatible Change
clickhouse-clientand
clickhouse-localnow default to multi-query mode (instead single-query mode). As an example,
clickhouse-client -q "SELECT 1; SELECT 2"now works, whereas users previously had to add
--multiquery(or
-n). The
--multiquery/-nswitch became obsolete. INSERT queries in multi-query statements are treated specially based on their FORMAT clause: If the FORMAT is
VALUES(the most common case), the end of the INSERT statement is represented by a trailing semicolon
;at the end of the query. For all other FORMATs (e.g.
CSVor
JSONEachRow), the end of the INSERT statement is represented by two newlines
\n\nat the end of the query. #63898 (FFish).
- In previous versions, it was possible to use an alternative syntax for
LowCardinalitydata types by appending
WithDictionaryto the name of the data type. It was an initial working implementation, and it was never documented or exposed to the public. Now, it is deprecated. If you have used this syntax, you have to ALTER your tables and rename the data types to
LowCardinality. #66842 (Alexey Milovidov).
- Fix logical errors with storage
Bufferused with distributed destination table. It's a backward incompatible change: queries using
Bufferwith a distributed destination table may stop working if the table appears more than once in the query (e.g., in a self-join). #67015 (vdimir).
- In previous versions, calling functions for random distributions based on the Gamma function (such as Chi-Squared, Student, Fisher) with negative arguments close to zero led to a long computation or an infinite loop. In the new version, calling these functions with zero or negative arguments will produce an exception. This closes #67297. #67326 (Alexey Milovidov).
- The system table
text_logis enabled by default. This is fully compatible with previous versions, but you may notice subtly increased disk usage on the local disk (this system table takes a tiny amount of disk space). #67428 (Alexey Milovidov).
- In previous versions,
arrayWithConstantcan be slow if asked to generate very large arrays. In the new version, it is limited to 1 GB per array. This closes #32754. #67741 (Alexey Milovidov).
- Fix REPLACE modifier formatting (forbid omitting brackets). #67774 (Azat Khuzhin).
- Backported in #68349: Reimplement
Dynamictype. Now when the limit of dynamic data types is reached new types are not cast to String but stored in a special data structure in binary format with binary encoded data type. Now any type ever inserted into
Dynamiccolumn can be read from it as subcolumn. #68132 (Kruglov Pavel).
New Feature
- Added a new
MergeTreesetting
deduplicate_merge_projection_modeto control the projections during merges (for specific engines) and
OPTIMIZE DEDUPLICATEquery. Supported options:
throw(throw an exception in case the projection is not fully supported for *MergeTree engine),
drop(remove projection during merge if it can't be merged itself consistently) and
rebuild(rebuild projection from scratch, which is a heavy operation). #66672 (jsc0218).
- Add
_etagvirtual column for S3 table engine. Fixes #65312. #65386 (skyoct).
- Added a tagging (namespace) mechanism for the query cache. The same queries with different tags are considered different by the query cache. Example:
SELECT 1 SETTINGS use_query_cache = 1, query_cache_tag = 'abc'and
SELECT 1 SETTINGS use_query_cache = 1, query_cache_tag = 'def'now create different query cache entries. #68235 (sakulali).
- Support more variants of JOIN strictness (
LEFT/RIGHT SEMI/ANTI/ANY JOIN) with inequality conditions which involve columns from both left and right table. e.g.
t1.y < t2.y(see the setting
allow_experimental_join_condition). #64281 (lgbo).
- Interpret Hive-style partitioning for different engines (
File,
URL,
S3,
AzureBlobStorage,
HDFS). Hive-style partitioning organizes data into partitioned sub-directories, making it efficient to query and manage large datasets. Currently, it only creates virtual columns with the appropriate name and data. The follow-up PR will introduce the appropriate data filtering (performance speedup). #65997 (Yarik Briukhovetskyi).
- Add function
printffor Spark compatibility (but you can use the existing
formatfunction). #66257 (李扬).
- Add options
restore_replace_external_engines_to_nulland
restore_replace_external_table_functions_to_nullto replace external engines and table_engines to
Nullengine that can be useful for testing. It should work for RESTORE and explicit table creation. #66536 (Ilya Yatsishin).
- Added support for reading
MULTILINESTRINGgeometry in
WKTformat using function
readWKTLineString. #67647 (Jacob Reckhard).
- Add a new table function
fuzzQuery. This function allows the modification of a given query string with random variations. Example:
SELECT query FROM fuzzQuery('SELECT 1') LIMIT 5;. #67655 (pufit).
- Add a query
ALTER TABLE ... DROP DETACHED PARTITION ALLto drop all detached partitions. #67885 (Duc Canh Le).
- Add the
rows_before_aggregation_at_leaststatistic to the query response when a new setting,
rows_before_aggregationis enabled. This statistic represents the number of rows read before aggregation. In the context of a distributed query, when using the
group byor
maxaggregation function without a
limit,
rows_before_aggregation_at_leastcan reflect the number of rows hit by the query. #66084 (morning-color).
- Support
OPTIMIZEquery on
Jointables to reduce their memory footprint. #67883 (Duc Canh Le).
- Allow run query instantly in play if you add
&run=1in the URL #66457 (Aleksandr Musorin).
Experimental Feature
- Implement a new
JSONdata type. #66444 (Kruglov Pavel).
- Add the new
TimeSeriestable engine. #64183 (Vitaly Baranov).
- Add new experimental
Kafkastorage engine to store offsets in Keeper instead of relying on committing them to Kafka. It makes the commit to ClickHouse tables atomic with regard to consumption from the queue. #57625 (János Benjamin Antal).
- Use adaptive read task size calculation method (adaptive meaning it depends on read column sizes) for parallel replicas. #60377 (Nikita Taranov).
- Added statistics type
count_min(count-min sketches) which provide selectivity estimations for equality predicates like
col = 'val'. Supported data types are string, date, datatime and numeric types. #65521 (JackyWoo).
Performance Improvement
- Setting
optimize_functions_to_subcolumnsis enabled by default. #68053 (Anton Popov).
- Store the
plain_rewritabledisk directory metadata in
__metalayout, separately from the merge tree data in the object storage. Move the
plain_rewritabledisk to a flat directory structure. #65751 (Julia Kartseva).
- Improve columns squashing (an operation happening in INSERT queries) for
String/
Array/
Map/
Variant/
Dynamictypes by reserving required memory in advance for all subcolumns. #67043 (Kruglov Pavel).
- Speed up
SYSTEM FLUSH LOGSand flush logs on shutdown. #67472 (Sema Checherinda).
- Improved overall performance of merges by reducing the overhead of the scheduling steps of merges. #68016 (Anton Popov).
- Speed up tables removal for
DROP DATABASEquery, increased the default value for
database_catalog_drop_table_concurrencyto 16. #67228 (Nikita Mikhaylov).
- Avoid allocating too much capacity for array column while writing ORC. Performance speeds up 15% for an Array column. #67879 (李扬).
- Speed up mutations for non-replicated MergeTree significantly #66911 #66909 (Alexey Milovidov).
Improvement
- Setting
allow_experimental_analyzeris renamed to
enable_analyzer. The old name is preserved in a form of an alias. This signifies that Analyzer is no longer in beta and is fully promoted to production. #66438 (Nikita Mikhaylov).
- Improve schema inference of date times. Now DateTime64 used only when date time has fractional part, otherwise regular DateTime is used. Inference of Date/DateTime is more strict now, especially when
date_time_input_format='best_effort'to avoid inferring date times from strings in corner cases. #68382 (Kruglov Pavel).
- ClickHouse server now supports new setting
max_keep_alive_requests. For keep-alive HTTP connections to the server it works in tandem with
keep_alive_timeout- if idle timeout not expired but there already more than
max_keep_alive_requestsrequests done through the given connection - it will be closed by the server. #61793 (Nikita Taranov).
- Various improvements in the advanced dashboard. This closes #67697. This closes #63407. This closes #51129. This closes #61204. #67701 (Alexey Milovidov).
- Do not require a grant for REMOTE when creating a Distributed table: a grant for the Distributed engine is enough. #65419 (jsc0218).
- Do not pass logs for keeper explicitly in the Docker image to allow overriding. #65564 (Azat Khuzhin).
- Introduced
use_same_password_for_base_backupsettings for
BACKUPand
RESTOREqueries, allowing to create and restore incremental backups to/from password protected archives. #66214 (Samuele).
- Ignore
async_load_databasesfor
ATTACHquery (previously it was possible for ATTACH to return before the tables had been attached). #66240 (Azat Khuzhin).
- Added logs and metrics for rejected connections (where there are not enough resources). #66410 (Alexander Tokmakov).
- Support proper
UUIDtype for MongoDB engine. #66671 (Azat Khuzhin).
- Add replication lag and recovery time metrics. #66703 (Miсhael Stetsyuk).
- Add
DiskS3NoSuchKeyErrorsmetric. #66704 (Miсhael Stetsyuk).
- Ensure the
COMMENTclause works for all table engines. #66832 (Joe Lynch).
- Function
mapFromArraysnow accepts
Map(K, V)as first argument, for example:
SELECT mapFromArrays(map('a', 4, 'b', 4), ['aa', 'bb'])now works and returns
{('a',4):'aa',('b',4):'bb'}. Also, if the 1st argument is an Array, it can now also be of type
Array(Nullable(T))or
Array(LowCardinality(Nullable(T)))as long as the actual array values are not
NULL. #67103 (李扬).
- Read configuration for
clickhouse-localfrom
~/.clickhouse-local. #67135 (Azat Khuzhin).
- Rename setting
input_format_orc_read_use_writer_time_zoneto
input_format_orc_reader_timezoneand allow the user to set the reader timezone. #67175 (kevinyhzou).
- Decrease level of the
Socket is not connectederror when HTTP connection immediately reset by peer after connecting, close #34218. #67177 (vdimir).
- Add ability to load dashboards for
system.dashboardsfrom config (once set, they overrides the default dashboards preset). #67232 (Azat Khuzhin).
- The window functions in SQL are traditionally in snake case. ClickHouse uses
camelCase, so new aliases
denseRank()and
percentRank()have been created. These new functions can be called the exact same as the original
dense_rank()and
percent_rank()functions. Both snake case and camelCase syntaxes remain usable. A new test for each of the functions has been added as well. This closes #67042 . #67334 (Peter Nguyen).
- Autodetect configuration file format if is not
.xml,
.ymlor
.yaml. If the file begins with < it might be XML, otherwise it might be YAML. It is useful when providing a configuration file from a pipe:
clickhouse-server --config-file <(echo "hello: world"). #67391 (sakulali).
- Functions
formatDateTimeand
formatDateTimeInJodaSyntaxnow treat their format parameter as optional. If it is not specified, format strings
%Y-%m-%d %H:%i:%sand
yyyy-MM-dd HH:mm:ssare assumed. Example:
SELECT parseDateTime('2021-01-04 23:12:34')now returns DateTime value
2021-01-04 23:12:34(previously, this threw an exception). #67399 (Robert Schulze).
- Automatically retry Keeper requests in KeeperMap if they happen because of timeout or connection loss. #67448 (Antonio Andelic).
- Add
-no-pieto Aarch64 Linux builds to allow proper introspection and symbolizing of stacktraces after a ClickHouse restart. #67916 (filimonov).
- Added profile events for merges and mutations for better introspection. #68015 (Anton Popov).
- Remove unnecessary logs for non-replicated
MergeTree. #68238 (Daniil Ivanik).
Build/Testing/Packaging Improvement
- Integration tests flaky check will not run each test case multiple times to find more issues in tests and make them more reliable. It is using
pytest-repeatlibrary to run test case multiple times for the same environment. It is important to cleanup tables and other entities in the end of a test case to pass. Repeating works much faster than several pytest runs as it starts necessary containers only once. #66986 (Ilya Yatsishin).
- Unblock the usage of CLion with ClickHouse. In previous versions, CLion freezed for a minute on every keypress. This closes #66994. #66995 (Alexey Milovidov).
- getauxval: avoid a crash under a sanitizer re-exec due to high ASLR entropy in newer Linux kernels. #67081 (Raúl Marín).
- Some parts of client code are extracted to a single file and highest possible level optimization is applied to them even for debug builds. This closes: #65745. #67215 (Nikita Mikhaylov).
Bug Fix
- Only relevant to the experimental Variant data type. Fix crash with Variant + AggregateFunction type. #67122 (Kruglov Pavel).
- Fix crash in DistributedAsyncInsert when connection is empty. #67219 (Pablo Marcos).
- Fix crash of
uniqand
uniqThetawith
tuple()argument. Closes #67303. #67306 (flynn).
- Fixes #66026. Avoid unresolved table function arguments traversal in
ReplaceTableNodeToDummyVisitor. #67522 (Dmitry Novik).
- Fix potential stack overflow in
JSONMergePatchfunction. Renamed this function from
jsonMergePatchto
JSONMergePatchbecause the previous name was wrong. The previous name is still kept for compatibility. Improved diagnostic of errors in the function. This closes #67304. #67756 (Alexey Milovidov).
- Fixed a NULL pointer dereference, triggered by a specially crafted query, that crashed the server via hopEnd, hopStart, tumbleEnd, and tumbleStart. #68098 (Salvatore Mesoraca).
- Fixed
Not-ready Setin some system tables when filtering using subqueries. #66018 (Michael Kolupaev).
- Fixed reading of subcolumns after
ALTER ADD COLUMNquery. #66243 (Anton Popov).
- Fix boolean literals in query sent to external database (for engines like
PostgreSQL). #66282 (vdimir).
- Fix formatting of query with aliased JOIN ON expression, e.g.
... JOIN t2 ON (x = y) AS e ORDER BY xshould be formatted as
... JOIN t2 ON ((x = y) AS e) ORDER BY x. #66312 (vdimir).
- Fix cluster() for inter-server secret (preserve initial user as before). #66364 (Azat Khuzhin).
- Fix possible runtime error while converting Array field with nulls to Array(Variant). #66727 (Kruglov Pavel).
- Fix for occasional deadlock in Context::getDDLWorker. #66843 (Alexander Gololobov).
- Fix creating KeeperMap table after an incomplete drop. #66865 (Antonio Andelic).
- Fix broken part error while restoring to a
s3_plain_rewritabledisk. #66881 (Vitaly Baranov).
- In rare cases ClickHouse could consider parts as broken because of some unexpected projections on disk. Now it's fixed. #66898 (alesapin).
- Fix invalid format detection in schema inference that could lead to logical error Format doesn't support schema inference. #66899 (Kruglov Pavel).
- Fix possible deadlock on query cancel with parallel replicas. #66905 (Nikita Taranov).
- Forbid create as select even when database_replicated_allow_heavy_create is set. It was unconditionally forbidden in 23.12 and accidentally allowed under the setting in unreleased 24.7. #66980 (vdimir).
- Reading from the
numberscould wrongly throw an exception when the
max_rows_to_readlimit was set. This closes #66992. #66996 (Alexey Milovidov).
- Add proper type conversion to lagInFrame and leadInFrame window functions - fixes msan test. #67091 (Yakov Olkhovskiy).
- TRUNCATE DATABASE used to stop replication as if it was a DROP DATABASE query, it's fixed. #67129 (Alexander Tokmakov).
- Use a separate client context in
clickhouse-local. #67133 (Vitaly Baranov).
- Fix error
Cannot convert column because it is non constant in source stream but must be constant in result.for a query that reads from the
Mergetable over the
Distriburtedtable with one shard. #67146 (Nikolai Kochetov).
- Correct behavior of
ORDER BY allwith disabled
enable_order_by_alland parallel replicas (distributed queries as well). #67153 (Igor Nikonov).
- Fix wrong usage of input_format_max_bytes_to_read_for_schema_inference in schema cache. #67157 (Kruglov Pavel).
- Fix the memory leak for count distinct, when exception issued during group by single nullable key. #67171 (Jet He).
- Fix an error in optimization which converts OUTER JOIN to INNER JOIN. This closes #67156. This closes #66447. The bug was introduced in https://github.com/ClickHouse/ClickHouse/pull/62907. #67178 (Maksim Kita).
- Fix error
Conversion from AggregateFunction(name, Type) to AggregateFunction(name, Nullable(Type)) is not supported. The bug was caused by the
optimize_rewrite_aggregate_function_with_ifoptimization. Fixes #67112. #67229 (Nikolai Kochetov).
- Fix hung query when using empty tuple as lhs of function IN. #67295 (Duc Canh Le).
- It was possible to create a very deep nested JSON data that triggered stack overflow while skipping unknown fields. This closes #67292. #67324 (Alexey Milovidov).
- Fix attaching ReplicatedMergeTree table after exception during startup. #67360 (Antonio Andelic).
- Fix segfault caused by incorrectly detaching from thread group in
Aggregator. #67385 (Antonio Andelic).
- Fix one more case when a non-deterministic function is specified in PK. #67395 (Nikolai Kochetov).
- Fixed
bloom_filterindex breaking queries with mildly weird conditions like
(k=2)=(k=2)or
has([1,2,3], k). #67423 (Michael Kolupaev).
- Correctly parse file name/URI containing
::if it's not an archive. #67433 (Antonio Andelic).
- Fix wait for tasks in ~WriteBufferFromS3 in case WriteBuffer was cancelled. #67459 (Kseniia Sumarokova).
- Protect temporary part directories from removing during RESTORE. #67491 (Vitaly Baranov).
- Fix execution of nested short-circuit functions. #67520 (Kruglov Pavel).
- Fix
Logical error: Expected the argument №N of type T to have X rows, but it has 0. The error could happen in a remote query with constant expression in
GROUP BY(with a new analyzer). #67536 (Nikolai Kochetov).
- Fix join on tuple with NULLs: Some queries with the new analyzer and
NULLinside the tuple in the
JOIN ONsection returned incorrect results. #67538 (vdimir).
- Fix redundant reschedule of FileCache::freeSpaceRatioKeepingThreadFunc() in case of full non-evictable cache. #67540 (Kseniia Sumarokova).
- Fix inserting into stream like engines (Kafka, RabbitMQ, NATS) through HTTP interface. #67554 (János Benjamin Antal).
- Fix for function
toStartOfWeekwhich returned the wrong result with a small
DateTime64value. #67558 (Yarik Briukhovetskyi).
- Fix creation of view with recursive CTE. #67587 (Yakov Olkhovskiy).
- Fix
Logical error: 'file_offset_of_buffer_end <= read_until_position'in filesystem cache. Closes #57508. #67623 (Kseniia Sumarokova).
- Fixes #62282. Removed the call to
convertFieldToString()and added datatype specific serialization code. Parameterized view substitution was broken for multiple datatypes when parameter value was a function or expression returning datatype instance. #67654 (Shankar).
- Fix crash on
percent_rank.
percent_rank's default frame type is changed to
range unbounded preceding and unbounded following.
IWindowFunction's default window frame is considered and now window functions without window frame definition in sql can be put into different
WindowTransfomers properly. #67661 (lgbo).
- Fix reloading SQL UDFs with UNION. Previously, restarting the server could make UDF invalid. #67665 (Antonio Andelic).
- Fix possible logical error "Unexpected return type from if" with experimental Variant type and enabled setting
use_variant_as_common_typein function if with Tuples and Maps. #67687 (Kruglov Pavel).
- Due to a bug in Linux Kernel, a query can hung in
TimerDescriptor::drain. This closes #37686. #67702 (Alexey Milovidov).
- Fix completion of
RESTORE ON CLUSTERcommand. #67720 (Vitaly Baranov).
- Fix dictionary hang in case of CANNOT_SCHEDULE_TASK while loading. #67751 (Azat Khuzhin).
- Queries like
SELECT count() FROM t WHERE cast(c = 1 or c = 9999 AS Bool) SETTINGS use_skip_indexes=1with bloom filter indexes on
cnow work correctly. #67781 (jsc0218).
- Fix wrong aggregation result in some queries with aggregation without keys and filter, close #67419. #67804 (vdimir).
- Validate experimental/suspicious data types in ALTER ADD/MODIFY COLUMN. #67911 (Kruglov Pavel).
- Fix DateTime64 parsing after constant folding in distributed queries, close #66773. #67920 (vdimir).
- Fix wrong
count()result when there is non-deterministic function in predicate. #67922 (János Benjamin Antal).
- Fixed the calculation of the maximum thread soft limit in containerized environments where the usable CPU count is limited. #67963 (Robert Schulze).
- Now ClickHouse doesn't consider part as broken if projection doesn't exist on disk but exists in
checksums.txt. #68003 (alesapin).
- Fixed skipping of untouched parts in mutations with new analyzer. Previously with enabled analyzer data in part could be rewritten by mutation even if mutation doesn't affect this part according to predicate. #68052 (Anton Popov).
- Removes an incorrect optimization to remove sorting in subqueries that use
OFFSET. Fixes #67906. #68099 (Graham Campbell).
- Attempt to fix
Block structure mismatch in AggregatingStep stream: different typesfor aggregate projection optimization. #68107 (Nikolai Kochetov).
- Try fix postgres crash when query is cancelled. #68288 (Kseniia Sumarokova).
- Fix missing sync replica mode in query
SYSTEM SYNC REPLICA. #68326 (Duc Canh Le).
ClickHouse release 24.7, 2024-07-30
Backward Incompatible Change
- Forbid
CRATE MATERIALIZED VIEW ... ENGINE Replicated*MergeTree POPULATE AS SELECT ...with Replicated databases. #63963 (vdimir).
clickhouse-keeper-clientwill only accept paths in string literals, such as
ls '/hello/world', not bare strings such as
ls /hello/world. #65494 (Alexey Milovidov).
- Metric
KeeperOutstandingRequetswas renamed to
KeeperOutstandingRequests. #66206 (Robert Schulze).
- Remove
is_deterministicfield from the
system.functionstable. #66630 (Alexey Milovidov).
- Function
tuplewill now try to construct named tuples in query (controlled by
enable_named_columns_in_function_tuple). Introduce function
tupleNamesto extract names from tuples. #54881 (Amos Bird).
- Change how deduplication for Materialized Views works. Fixed a lot of cases like: - on destination table: data is split for 2 or more blocks and that blocks is considered as duplicate when that block is inserted in parallel. - on MV destination table: the equal blocks are deduplicated, that happens when MV often produces equal data as a result for different input data due to performing aggregation. - on MV destination table: the equal blocks which comes from different MV are deduplicated. #61601 (Sema Checherinda).
- Functions
bitShiftLeftand
bitShitfRightreturn an error for out of bounds shift positions #65838 (Pablo Marcos).
New Feature
- Add
ASOF JOINsupport for
full_sorting_joinalgorithm. #55051 (vdimir).
- Support JWT authentication in
clickhouse-client(will be available only in ClickHouse Cloud). #62829 (Konstantin Bogdanov).
- Add SQL functions
changeYear,
changeMonth,
changeDay,
changeHour,
changeMinute,
changeSecond. For example,
SELECT changeMonth(toDate('2024-06-14'), 7)returns date
2024-07-14. #63186 (cucumber95).
- Introduce startup scripts, which allow the execution of preconfigured queries at the startup stage. #64889 (pufit).
- Support accept_invalid_certificate in client's config in order to allow for client to connect over secure TCP to a server running with self-signed certificate - can be used as a shorthand for corresponding
openSSLclient settings
verificationMode=none+
invalidCertificateHandler.name=AcceptCertificateHandler. #65238 (peacewalker122).
- Add system.error_log which contains history of error values from table system.errors, periodically flushed to disk. #65381 (Pablo Marcos).
- Add aggregate function
groupConcat. About the same as
arrayStringConcat( groupArray(column), ',')Can receive 2 parameters: a string delimiter and the number of elements to be processed. #65451 (Yarik Briukhovetskyi).
- Add AzureQueue storage. #65458 (Kseniia Sumarokova).
- Add a new setting to disable/enable writing page index into parquet files. #65475 (lgbo).
- Introduce
logger.console_log_levelserver config to control the log level to the console (if enabled). #65559 (Azat Khuzhin).
- Automatically append a wildcard
*to the end of a directory path with table function
file. #66019 (Zhidong (David) Guo).
- Add
--memory-usageoption to client in non-interactive mode. #66393 (vdimir).
- Make an interactive client for clickhouse-disks, add local disk from the local directory. #64446 (Daniil Ivanik).
- When lightweight delete happens on a table with projection(s), users have choices either throw an exception (by default) or drop the projection #65594 (jsc0218).
- Add system tables with main information about all detached tables. #65400 (Konstantin Morozov).
Experimental Feature
- Change binary serialization of the
Variantdata type: add
compactmode to avoid writing the same discriminator multiple times for granules with single variant or with only NULL values. Add MergeTree setting
use_compact_variant_discriminators_serializationthat is enabled by default. Note that Variant type is still experimental and backward-incompatible change in serialization is ok. #62774 (Kruglov Pavel).
- Support on-disk backend storage for clickhouse-keeper. #56626 (Han Fei).
- Refactor JSONExtract functions, support more types including experimental Dynamic type. #66046 (Kruglov Pavel).
- Support null map subcolumn for
Variantand
Dynamicsubcolumns. #66178 (Kruglov Pavel).
- Fix reading
Dynamicsubcolumns from altered
Memorytable. Previously if
max_typesparameter of a Dynamic type was changed in Memory table via alter, further subcolumns reading can return wrong result. #66066 (Kruglov Pavel).
- Add support for
cluster_for_parallel_replicaswhen using custom key parallel replicas. It allows you to use parallel replicas with custom key with MergeTree tables. #65453 (Antonio Andelic).
Performance Improvement
- Replace int to string algorithm with a faster one (from a modified amdn/itoa to a modified jeaiii/itoa). #61661 (Raúl Marín).
- Sizes of hash tables created by join (
parallel_hashalgorithm) are collected and cached now. This information will be used to preallocate space in hash tables for subsequent query executions and save time on hash table resizes. #64553 (Nikita Taranov).
- Optimized queries with
ORDER BYprimary key and
WHEREthat have a condition with high selectivity by using buffering. It is controlled by setting
read_in_order_use_buffering(enabled by default) and can increase memory usage of query. #64607 (Anton Popov).
- Improve performance of loading
plain_rewritablemetadata. #65634 (Alexey Milovidov).
- Attaching tables on read-only disks will use fewer resources by not loading outdated parts. #65635 (Alexey Milovidov).
- Support minmax hyperrectangle for Set indices. #65676 (AntiTopQuark).
- Unload primary index of outdated parts to reduce total memory usage. #65852 (Anton Popov).
- Functions
replaceRegexpAlland
replaceRegexpOneare now significantly faster if the pattern is trivial, i.e. contains no metacharacters, pattern classes, flags, grouping characters etc. (Thanks to Taiyang Li). #66185 (Robert Schulze).
- s3 requests: Reduce retry time for queries, increase retries count for backups. 8.5 minutes and 100 retires for queries, 1.2 hours and 1000 retries for backup restore. #65232 (Sema Checherinda).
- Support query plan LIMIT optimization. Support LIMIT pushdown for PostgreSQL storage and table function. #65454 (Maksim Kita).
- Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite
fallback_session_lifetime. Added support for AZ-aware balancing. #65570 (Alexander Tokmakov).
- DatabaseCatalog drops tables faster by using up to database_catalog_drop_table_concurrency threads. #66065 (Sema Checherinda).
Improvement
- Improved ZooKeeper load balancing. The current session doesn't expire until the optimal nodes become available despite
fallback_session_lifetime. Added support for AZ-aware balancing. #65570 (Alexander Tokmakov).
- The setting
optimize_trivial_insert_selectis disabled by default. In most cases, it should be beneficial. Nevertheless, if you are seeing slower INSERT SELECT or increased memory usage, you can enable it back or
SET compatibility = '24.6'. #58970 (Alexey Milovidov).
- Print stacktrace and diagnostic info if
clickhouse-clientor
clickhouse-localcrashes. #61109 (Alexander Tokmakov).
- The result of
SHOW INDEX | INDEXES | INDICES | KEYSwas previously sorted by the primary key column names. Since this was unintuitive, the result is now sorted by the position of the primary key columns within the primary key. #61131 (Robert Schulze).
- Change how deduplication for Materialized Views works. Fixed a lot of cases like: - on destination table: data is split for 2 or more blocks and that blocks is considered as duplicate when that block is inserted in parallel. - on MV destination table: the equal blocks are deduplicated, that happens when MV often produces equal data as a result for different input data due to performing aggregation. - on MV destination table: the equal blocks which comes from different MV are deduplicated. #61601 (Sema Checherinda).
- Support reading partitioned data DeltaLake data. Infer DeltaLake schema by reading metadata instead of data. #63201 (Kseniia Sumarokova).
- In composable protocols TLS layer accepted only
certificateFileand
privateKeyFileparameters. https://clickhouse.com/docs/operations/settings/composable-protocols. #63985 (Anton Ivashkin).
- Added profile event
SelectQueriesWithPrimaryKeyUsagewhich indicates how many SELECT queries use the primary key to evaluate the WHERE clause. #64492 (0x01f).
StorageS3Queuerelated fixes and improvements. Deduce a default value of
s3queue_processing_threads_numaccording to the number of physical cpu cores on the server (instead of the previous default value as 1). Set default value of
s3queue_loading_retriesto 10. Fix possible vague "Uncaught exception" in exception column of
system.s3queue. Do not increment retry count on
MEMORY_LIMIT_EXCEEDEDexception. Move files commit to a stage after insertion into table fully finished to avoid files being commited while not inserted. Add settings
s3queue_max_processed_files_before_commit,
s3queue_max_processed_rows_before_commit,
s3queue_max_processed_bytes_before_commit,
s3queue_max_processing_time_sec_before_commit, to better control commit and flush time. #65046 (Kseniia Sumarokova).
- Support aliases in parametrized view function (only new analyzer). #65190 (Kseniia Sumarokova).
- Updated to mask account key in logs in azureBlobStorage. #65273 (SmitaRKulkarni).
- Partition pruning for
INpredicates when filter expression is a part of
PARTITION BYexpression. #65335 (Eduard Karacharov).
arrayMin/
arrayMaxcan be applicable to all data types that are comparable. #65455 (pn).
- Improved memory accounting for cgroups v2 to exclude the amount occupied by the page cache. #65470 (Nikita Taranov).
- Do not create format settings for each row when serializing chunks to insert to EmbeddedRocksDB table. #65474 (Duc Canh Le).
- Reduce
clickhouse-localprompt to just
:).
getFQDNOrHostName()takes too long on macOS, and we don't want a hostname in the prompt for
clickhouse-localanyway. #65510 (Konstantin Bogdanov).
- Avoid printing a message from jemalloc about per-CPU arenas on low-end virtual machines. #65532 (Alexey Milovidov).
- Disable filesystem cache background download by default. It will be enabled back when we fix the issue with possible "Memory limit exceeded" because memory deallocation is done outside of query context (while buffer is allocated inside of query context) if we use background download threads. Plus we need to add a separate setting to define max size to download for background workers (currently it is limited by max_file_segment_size, which might be too big). #65534 (Kseniia Sumarokova).
- Add new option to config
<config_reload_interval_ms>which allow to specify how often clickhouse will reload config. #65545 (alesapin).
- Implement binary encoding for ClickHouse data types and add its specification in docs. Use it in Dynamic binary serialization, allow to use it in RowBinaryWithNamesAndTypes and Native formats under settings. #65546 (Kruglov Pavel).
- Server settings
compiled_expression_cache_sizeand
compiled_expression_cache_elements_sizeare now shown in
system.server_settings. #65584 (Robert Schulze).
- Add support for user identification based on x509 SubjectAltName extension. #65626 (Anton Kozlov).
clickhouse-localwill respect the
max_server_memory_usageand
max_server_memory_usage_to_ram_ratiofrom the configuration file. It will also set the max memory usage to 90% of the system memory by default, like
clickhouse-serverdoes. #65697 (Alexey Milovidov).
- Add a script to backup your files to ClickHouse. #65699 (Alexey Milovidov).
- PostgreSQL source to support query cancellations. #65722 (Maksim Kita).
- Make
allow_experimental_analyzerbe controlled by the initiator for distributed queries. This ensures compatibility and correctness during operations in mixed version clusters. #65777 (Nikita Mikhaylov).
- Respect cgroup CPU limit in Keeper. #65819 (Antonio Andelic).
- Allow to use
concatfunction with empty arguments
:) select concat();. #65887 (李扬).
- Allow controlling named collections in
clickhouse-local. #65973 (Alexey Milovidov).
- Improve Azure-related profile events. #65999 (alesapin).
- Support ORC file read by writer's time zone. #66025 (kevinyhzou).
- Add settings to control connections to PostgreSQL. The setting
postgresql_connection_attempt_timeoutspecifies the value passed to
connect_timeoutparameter of connection URL. The setting
postgresql_connection_pool_retriesspecifies the number of retries to establish a connection to the PostgreSQL end-point. #66232 (Dmitry Novik).
- Reduce inaccuracy of
input_wait_elapsed_us/
elapsed_usin the
system.processors_profile_log. #66239 (Azat Khuzhin).
- Improve ProfileEvents for the filesystem cache. #66249 (zhukai).
- Add settings to ignore the
ON CLUSTERclause in queries for named collection management with the replicated storage. #66288 (MikhailBurdukov).
- Function
generateSnowflakeIDnow allows to specify a machine ID as a parameter to prevent collisions in large clusters. #66374 (ZAWA_ll).
- Disable suspending on
Ctrl+Zin interactive mode. This is a common trap and is not expected behavior for almost all users. I imagine only a few extreme power users could appreciate suspending terminal applications to the background, but I don't know any. #66511 (Alexey Milovidov).
- Add option for validating the primary key type in Dictionaries. Without this option for simple layouts any column type will be implicitly converted to UInt64. #66595 (MikhailBurdukov).
Bug Fix (user-visible misbehavior in an official stable release)
- Check cyclic dependencies on CREATE/REPLACE/RENAME/EXCHANGE queries and throw an exception if there is a cyclic dependency. Previously such cyclic dependencies could lead to a deadlock during server startup. Also fix some bugs in dependencies creation. #65405 (Kruglov Pavel).
- Fix unexpected sizes of
LowCardinalitycolumns in function calls. #65298 (Raúl Marín).
- Fix crash in maxIntersections. #65689 (Raúl Marín).
- Fix the
VALID UNTILclause in the user definition resetting after a restart. #66409 (Nikolay Degterinsky).
- Fix the remaining time column in
SHOW MERGES. #66735 (Alexey Milovidov).
Query was cancelledmight have been printed twice in clickhouse-client. This behaviour is fixed. #66005 (Nikita Mikhaylov).
- Fixed crash while using
MaterializedMySQL(which is an unsupported, experimental feature) with TABLE OVERRIDE that maps MySQL NULL field into ClickHouse not NULL field. #54649 (Filipp Ozinov).
- Fix logical error when
PREWHEREexpression read no columns and table has no adaptive index granularity (very old table). #59173 (Alexander Gololobov).
- Fix bug with the cancellation buffer when canceling a query. #64478 (Sema Checherinda).
- Fix filling parts columns from metadata (when columns.txt does not exists). #64757 (Azat Khuzhin).
- Fix crash for
ALTER TABLE ... ON CLUSTER ... MODIFY SQL SECURITY. #64957 (pufit).
- Fix crash on destroying AccessControl: add explicit shutdown. #64993 (Vitaly Baranov).
- Eliminate injective function in argument of functions
uniq*recursively. This used to work correctly but was broken in the new analyzer. #65140 (Duc Canh Le).
- Fix unexpected projection name when query with CTE. #65267 (wudidapaopao).
- Require
dictGetprivilege when accessing dictionaries via direct query or the
Dictionarytable engine. #65359 (Joe Lynch).
- Fix user-specific S3 auth with incremental backups. #65481 (Antonio Andelic).
- Disable
non-intersecting-partsoptimization for queries with
FINALin case of
read-in-orderoptimization was enabled. This could lead to an incorrect query result. As a workaround, disable
do_not_merge_across_partitions_select_finaland
split_parts_ranges_into_intersecting_and_non_intersecting_finalbefore this fix is merged. #65505 (Nikolai Kochetov).
- Fix getting exception
Index out of bound for blob metadatain case all files from list batch were filtered out. #65523 (Kseniia Sumarokova).
- Fix NOT_FOUND_COLUMN_IN_BLOCK for deduplicate merge of projection. #65573 (Yakov Olkhovskiy).
- Fixed bug in MergeJoin. Column in sparse serialisation might be treated as a column of its nested type though the required conversion wasn't performed. #65632 (Nikita Taranov).
- Fixed a bug that compatibility level '23.4' was not properly applied. #65737 (cw5121).
- Fix odbc table with nullable fields. #65738 (Rodolphe Dugé de Bernonville).
- Fix data race in
TCPHandler, which could happen on fatal error. #65744 (Kseniia Sumarokova).
- Fix invalid exceptions in function
parseDateTimewith
%Fand
%Dplaceholders. #65768 (Antonio Andelic).
- For queries that read from
PostgreSQL, cancel the internal
PostgreSQLquery if the ClickHouse query is finished. Otherwise,
ClickHousequery cannot be canceled until the internal
PostgreSQLquery is finished. #65771 (Maksim Kita).
- Fix a bug in short circuit logic when old analyzer and dictGetOrDefault is used. #65802 (jsc0218).
- Fix a bug leads to EmbeddedRocksDB with TTL write corrupted SST files. #65816 (Duc Canh Le).
- Functions
bitTest,
bitTestAll, and
bitTestAnynow return an error if the specified bit index is out-of-bounds #65818 (Pablo Marcos).
- Setting
join_any_take_last_rowis supported in any query with hash join. #65820 (vdimir).
- Better handling of join conditions involving
IS NULLchecks (for example
ON (a = b AND (a IS NOT NULL) AND (b IS NOT NULL) ) OR ( (a IS NULL) AND (b IS NULL) )is rewritten to
ON a <=> b), fix incorrect optimization when condition other then
IS NULLare present. #65835 (vdimir).
- Fix growing memory usage in S3Queue. #65839 (Kseniia Sumarokova).
- Fix tie handling in
arrayAUCto match sklearn. #65840 (gabrielmcg44).
- Fix possible issues with MySQL server protocol TLS connections. #65917 (Azat Khuzhin).
- Fix possible issues with MySQL client protocol TLS connections. #65938 (Azat Khuzhin).
- Fix handling of
SSL_ERROR_WANT_READ/
SSL_ERROR_WANT_WRITEwith zero timeout. #65941 (Azat Khuzhin).
- Add missing settings
input_format_csv_skip_first_lines/input_format_tsv_skip_first_lines/input_format_csv_try_infer_numbers_from_strings/input_format_csv_try_infer_strings_from_quoted_tuplesin schema inference cache because they can change the resulting schema. It prevents from incorrect result of schema inference with these settings changed. #65980 (Kruglov Pavel).
- Column _size in s3 engine and s3 table function denotes the size of a file inside the archive, not a size of the archive itself. #65993 (Daniil Ivanik).
- Fix resolving dynamic subcolumns in analyzer, avoid reading the whole column on dynamic subcolumn reading. #66004 (Kruglov Pavel).
- Fix config merging for from_env with replace overrides. #66034 (Azat Khuzhin).
- Fix a possible hanging in
GRPCServerduring shutdown. #66061 (Vitaly Baranov).
- Fixed several cases in function
haswith non-constant
LowCardinalityarguments. #66088 (Anton Popov).
- Fix for
groupArrayIntersect. It had incorrect behavior in the
merge()function. Also, fixed behavior in
deserialise()for numeric and general data. #66103 (Yarik Briukhovetskyi).
- Fixed buffer overflow bug in
unbin/
unheximplementation. #66106 (Nikita Taranov).
- Disable the
merge-filtersoptimization introduced in #64760. It may cause an exception if optimization merges two filter expressions and does not apply a short-circuit evaluation. #66126 (Nikolai Kochetov).
- Fixed the issue when the server failed to parse Avro files with negative block size arrays encoded, which is now allowed by the Avro specification. #66130 (Serge Klochkov).
- Fixed a bug in ZooKeeper client: a session could get stuck in unusable state after receiving a hardware error from ZooKeeper. For example, this might happen due to "soft memory limit" in ClickHouse Keeper. #66140 (Alexander Tokmakov).
- Fix issue in SumIfToCountIfVisitor and signed integers. #66146 (Raúl Marín).
- Fix rare case with missing data in the result of distributed query. #66174 (vdimir).
- Fix order of parsing metadata fields in StorageDeltaLake. #66211 (Kseniia Sumarokova).
- Don't throw
TIMEOUT_EXCEEDEDfor
none_only_activemode of
distributed_ddl_output_mode. #66218 (Alexander Tokmakov).
- Fix handling limit for
system.numbers_mtwhen no index can be used. #66231 (János Benjamin Antal).
- Fixed how the ClickHouse server detects the maximum number of usable CPU cores as specified by cgroups v2 if the server runs in a container such as Docker. In more detail, containers often run their process in the root cgroup which has an empty name. In that case, ClickHouse ignored the CPU limits set by cgroups v2. #66237 (filimonov).
- Fix the
Not-ready seterror when a subquery with
INis used in the constraint. #66261 (Nikolai Kochetov).
- Fix error reporting while copying to S3 or AzureBlobStorage. #66295 (Vitaly Baranov).
- Prevent watchdog from keeping descriptors of unlinked (rotated) log files. #66334 (Aleksei Filatov).
- Fix the bug that logicalexpressionoptimizerpass lost logical type of constant. #66344 (pn).
- Fix
Column identifier is already registerederror with
group_by_use_nulls=trueand new analyzer. #66400 (Nikolai Kochetov).
- Fix possible incorrect result for queries joining and filtering table external engine (like PostgreSQL), due to too aggressive filter pushdown. Since now, conditions from where section won't be send to external database in case of outer join with external table. #66402 (vdimir).
- Added missing column materialization for cross join. #66413 (lgbo).
- Fix
Cannot find columnerror for queries with constant expression in
GROUP BYkey and new analyzer enabled. #66433 (Nikolai Kochetov).
- Avoid possible logical error during import from Npy format in case of bad array nesting level, fix testing of other kinds of errors. #66461 (Yarik Briukhovetskyi).
- Fix wrong count() result when there is non-deterministic function in predicate. #66510 (Duc Canh Le).
- Correctly track memory for
Allocator::realloc. #66548 (Antonio Andelic).
- Fix reading of uninitialized memory when hashing empty tuples. #66562 (Alexey Milovidov).
- Fix an invalid result for queries with
WINDOW. This could happen when
PARTITIONcolumns have sparse serialization and window functions are executed in parallel. #66579 (Nikolai Kochetov).
- Fix removing named collections in local storage. #66599 (János Benjamin Antal).
- Fix
column_lengthis not updated in
ColumnTuple::insertManyFrom. #66626 (lgbo).
- Fix
Unknown identifierand
Column is not under aggregate functionerrors for queries with the expression
(column IS NULL).The bug was triggered by #65088, with the disabled analyzer only. #66654 (Nikolai Kochetov).
- Fix
Method getResultType is not supported for QUERY query nodeerror when scalar subquery was used as the first argument of IN (with new analyzer). #66655 (Nikolai Kochetov).
- Fix possible PARAMETER_OUT_OF_BOUND error during reading variant subcolumn. #66659 (Kruglov Pavel).
- Fix rare case of stuck merge after drop column. #66707 (Raúl Marín).
- Fix assertion
isUniqTypeswhen insert select from remote sources. #66722 (Sema Checherinda).
- Fix logical error in PrometheusRequestHandler. #66621 (Vitaly Baranov).
- Fix
indexHintfunction case found by fuzzer. #66286 (Anton Popov).
- Fix AST formatting of 'create table b empty as a'. #64951 (Michael Kolupaev).
ClickHouse release 24.6, 2024-07-01
Backward Incompatible Change
- Enable asynchronous load of databases and tables by default. See the
async_load_databasesin config.xml. While this change is fully compatible, it can introduce a difference in behavior. When
async_load_databasesis false, as in the previous versions, the server will not accept connections until all tables are loaded. When
async_load_databasesis true, as in the new version, the server can accept connections before all the tables are loaded. If a query is made to a table that is not yet loaded, it will wait for the table's loading, which can take considerable time. It can change the behavior of the server if it is part of a large distributed system under a load balancer. In the first case, the load balancer can get a connection refusal and quickly failover to another server. In the second case, the load balancer can connect to a server that is still loading the tables, and the query will have a higher latency. Moreover, if many queries accumulate in the waiting state, it can lead to a "thundering herd" problem when they start processing simultaneously. This can make a difference only for highly loaded distributed backends. You can set the value of
async_load_databasesto false to avoid this problem. #57695 (Alexey Milovidov).
- Setting
replace_long_file_name_to_hashis enabled by default for
MergeTreetables. #64457 (Anton Popov). This setting is fully compatible, and no actions needed during upgrade. The new data format is supported from all versions starting from 23.9. After enabling this setting, you can no longer downgrade to a version 23.8 or older.
- Some invalid queries will fail earlier during parsing. Note: disabled the support for inline KQL expressions (the experimental Kusto language) when they are put into a
kqltable function without a string literal, e.g.
kql(garbage | trash)instead of
kql('garbage | trash')or
kql($$garbage | trash$$). This feature was introduced unintentionally and should not exist. #61500 (Alexey Milovidov).
- Rework parallel processing in
Orderedmode of storage
S3Queue. This PR is backward incompatible for Ordered mode if you used settings
s3queue_processing_threads_numor
s3queue_total_shards_num. Setting
s3queue_total_shards_numis deleted, previously it was allowed to use only under
s3queue_allow_experimental_sharded_mode, which is now deprecated. A new setting is added -
s3queue_buckets. #64349 (Kseniia Sumarokova).
- New functions
snowflakeIDToDateTime,
snowflakeIDToDateTime64,
dateTimeToSnowflakeID, and
dateTime64ToSnowflakeIDwere added. Unlike the existing functions
snowflakeToDateTime,
snowflakeToDateTime64,
dateTimeToSnowflake, and
dateTime64ToSnowflake, the new functions are compatible with function
generateSnowflakeID, i.e. they accept the snowflake IDs generated by
generateSnowflakeIDand produce snowflake IDs of the same type as
generateSnowflakeID(i.e.
UInt64). Furthermore, the new functions default to the UNIX epoch (aka. 1970-01-01), just like
generateSnowflakeID. If necessary, a different epoch, e.g. Twitter's/X's epoch 2010-11-04 aka. 1288834974657 msec since UNIX epoch, can be passed. The old conversion functions are deprecated and will be removed after a transition period: to use them regardless, enable setting
allow_deprecated_snowflake_conversion_functions. #64948 (Robert Schulze).
New Feature
- Allow to store named collections in ClickHouse Keeper. #64574 (Kseniia Sumarokova).
- Support empty tuples. #55061 (Amos Bird).
- Add Hilbert Curve encode and decode functions. #60156 (Artem Mustafin).
- Add support for index analysis over
hilbertEncode. #64662 (Artem Mustafin).
- Added support for reading
LINESTRINGgeometry in the WKT format using function
readWKTLineString. #62519 (Nikita Mikhaylov).
- Allow to attach parts from a different disk. #63087 (Unalian).
- Added new SQL functions
generateSnowflakeIDfor generating Twitter-style Snowflake IDs. #63577 (Danila Puzov).
- Added
merge_workloadand
mutation_workloadsettings to regulate how resources are utilized and shared between merges, mutations and other workloads. #64061 (Sergei Trifonov).
- Add support for comparing
IPv4and
IPv6types using the
=operator. #64292 (Francisco J. Jurado Moreno).
- Support decimal arguments in binary math functions (pow, atan2, max2, min2, hypot). #64582 (Mikhail Gorshkov).
- Added SQL functions
parseReadableSize(along with
OrNulland
OrZerovariants). #64742 (Francisco J. Jurado Moreno).
- Add server settings
max_table_num_to_throwand
max_database_num_to_throwto limit the number of databases or tables on
CREATEqueries. #64781 (Xu Jia).
- Add
_timevirtual column to file alike storages (s3/file/hdfs/url/azureBlobStorage). #64947 (Ilya Golshtein).
- Introduced new functions
base64URLEncode,
base64URLDecodeand
tryBase64URLDecode. #64991 (Mikhail Gorshkov).
- Add new function
editDistanceUTF8, which calculates the edit distance between two UTF8 strings. #65269 (LiuNeng).
- Add
http_response_headersconfiguration to support custom response headers in custom HTTP handlers. #63562 (Grigorii).
- Added a new table function
loopto support returning query results in an infinite loop. #63452 (Sariel). This is useful for testing.
- Introduced two additional columns in the
system.query_log:
used_privilegesand
missing_privileges.
used_privilegesis populated with the privileges that were checked during query execution, and
missing_privilegescontains required privileges that are missing. #64597 (Alexey Katsman).
- Added a setting
output_format_pretty_display_footer_column_nameswhich when enabled displays column names at the end of the table for long tables (50 rows by default), with the threshold value for minimum number of rows controlled by
output_format_pretty_display_footer_column_names_min_rows. #65144 (Shaun Struwig).
Experimental Feature
- Introduce statistics of type "number of distinct values". #59357 (Han Fei).
- Support statistics with ReplicatedMergeTree. #64934 (Han Fei).
- If "replica group" is configured for a
Replicateddatabase, automatically create a cluster that includes replicas from all groups. #64312 (Alexander Tokmakov).
- Add settings
parallel_replicas_custom_key_range_lowerand
parallel_replicas_custom_key_range_upperto control how parallel replicas with dynamic shards parallelizes queries when using a range filter. #64604 (josh-hildred).
Performance Improvement
- Add the ability to reshuffle rows during insert to optimize for size without violating the order set by
PRIMARY KEY. It's controlled by the setting
optimize_row_order(off by default). #63578 (Igor Markelov).
- Add a native parquet reader, which can read parquet binary to ClickHouse Columns directly. It's controlled by the setting
input_format_parquet_use_native_reader(disabled by default). #60361 (ZhiHong Zhang).
- Support partial trivial count optimization when the query filter is able to select exact ranges from merge tree tables. #60463 (Amos Bird).
- Reduce max memory usage of multi-threaded
INSERTs by collecting chunks of multiple threads in a single transform. #61047 (Yarik Briukhovetskyi).
- Reduce the memory usage when using Azure object storage by using fixed memory allocation, avoiding the allocation of an extra buffer. #63160 (SmitaRKulkarni).
- Reduce the number of virtual function calls in
ColumnNullable::size. #60556 (HappenLee).
- Speedup
splitByRegexpwhen the regular expression argument is a single-character. #62696 (Robert Schulze).
- Speed up aggregation by 8-bit and 16-bit keys by keeping track of the min and max keys used. This allows to reduce the number of cells that need to be verified. #62746 (Jiebin Sun).
- Optimize operator IN when the left hand side is
LowCardinalityand the right is a set of constants. #64060 (Zhiguo Zhou).
- Use a thread pool to initialize and destroy hash tables inside
ConcurrentHashJoin. #64241 (Nikita Taranov).
- Optimized vertical merges in tables with sparse columns. #64311 (Anton Popov).
- Enabled prefetches of data from remote filesystem during vertical merges. It improves latency of vertical merges in tables with data stored on remote filesystem. #64314 (Anton Popov).
- Reduce redundant calls to
isDefaultof
ColumnSparse::filterto improve performance. #64426 (Jiebin Sun).
- Speedup
find_super_nodesand
find_big_familykeeper-client commands by making multiple asynchronous getChildren requests. #64628 (Alexander Gololobov).
- Improve function
least/
greatestfor nullable numberic type arguments. #64668 (KevinyhZou).
- Allow merging two consequent filtering steps of a query plan. This improves filter-push-down optimization if the filter condition can be pushed down from the parent step. #64760 (Nikolai Kochetov).
- Remove bad optimization in the vertical final implementation and re-enable vertical final algorithm by default. #64783 (Duc Canh Le).
- Remove ALIAS nodes from the filter expression. This slightly improves performance for queries with
PREWHERE(with the new analyzer). #64793 (Nikolai Kochetov).
- Re-enable OpenSSL session caching. #65111 (Robert Schulze).
- Added settings to disable materialization of skip indexes and statistics on inserts (
materialize_skip_indexes_on_insertand
materialize_statistics_on_insert). #64391 (Anton Popov).
- Use the allocated memory size to calculate the row group size and reduce the peak memory of the parquet writer in the single-threaded mode. #64424 (LiuNeng).
- Improve the iterator of sparse column to reduce call of
size. #64497 (Jiebin Sun).
- Update condition to use server-side copy for backups to Azure blob storage. #64518 (SmitaRKulkarni).
- Optimized memory usage of vertical merges for tables with high number of skip indexes. #64580 (Anton Popov).
Improvement
SHOW CREATE TABLEexecuted on top of system tables will now show the super handy comment unique for each table which will explain why this table is needed. #63788 (Nikita Mikhaylov).
- The second argument (scale) of functions
round(),
roundBankers(),
floor(),
ceil()and
trunc()can now be non-const. #64798 (Mikhail Gorshkov).
- Hot reload storage policy for
Distributedtables when adding a new disk. #58285 (Duc Canh Le).
- Avoid possible deadlock during MergeTree index analysis when scheduling threads in a saturated service. #59427 (Sean Haynes).
- Several minor corner case fixes to S3 proxy support & tunneling. #63427 (Arthur Passos).
- Improve io_uring resubmit visibility. Rename profile event
IOUringSQEsResubmits->
IOUringSQEsResubmitsAsyncand add a new one
IOUringSQEsResubmitsSync. #63699 (Tomer Shafir).
- Added a new setting,
metadata_keep_free_space_bytesto keep free space on the metadata storage disk. #64128 (MikhailBurdukov).
- Add metrics to track the number of directories created and removed by the
plain_rewritablemetadata storage, and the number of entries in the local-to-remote in-memory map. #64175 (Julia Kartseva).
- The query cache now considers identical queries with different settings as different. This increases robustness in cases where different settings (e.g.
limitor
additional_table_filters) would affect the query result. #64205 (Robert Schulze).
- Support the non standard error code
QpsLimitExceededin object storage as a retryable error. #64225 (Sema Checherinda).
- Forbid converting a MergeTree table to replicated if the zookeeper path for this table already exists. #64244 (Kirill).
- Added a new setting
input_format_parquet_prefer_block_bytesto control the average output block bytes, and modified the default value of
input_format_parquet_max_block_sizeto 65409. #64427 (LiuNeng).
- Allow proxy to be bypassed for hosts specified in
no_proxyenv variable and ClickHouse proxy configuration. #63314 (Arthur Passos).
- Always start Keeper with sufficient amount of threads in global thread pool. #64444 (Duc Canh Le).
- Settings from the user's config don't affect merges and mutations for
MergeTreeon top of object storage. #64456 (alesapin).
- Support the non standard error code
TotalQpsLimitExceededin object storage as a retryable error. #64520 (Sema Checherinda).
- Updated Advanced Dashboard for both open-source and ClickHouse Cloud versions to include a chart for 'Maximum concurrent network connections'. #64610 (Thom O'Connor).
- Improve progress report on
zeros_mtand
generateRandom. #64804 (Raúl Marín).
- Add an asynchronous metric
jemalloc.profile.activeto show whether sampling is currently active. This is an activation mechanism in addition to prof.active; both must be active for the calling thread to sample. #64842 (Unalian).
- Remove mark of
allow_experimental_join_conditionas important. This mark may have prevented distributed queries in a mixed versions cluster from being executed successfully. #65008 (Nikita Mikhaylov).
- Added server Asynchronous metrics
DiskGetObjectThrottler*and
DiskGetObjectThrottler*reflecting request per second rate limit defined with
s3_max_get_rpsand
s3_max_put_rpsdisk settings and currently available number of requests that could be sent without hitting throttling limit on the disk. Metrics are defined for every disk that has a configured limit. #65050 (Sergei Trifonov).
- Initialize global trace collector for
Poco::ThreadPool(needed for Keeper, etc). #65239 (Kseniia Sumarokova).
- Add a validation when creating a user with
bcrypt_hash. #65242 (Raúl Marín).
- Add profile events for number of rows read during/after
PREWHERE. #64198 (Nikita Taranov).
- Print query in
EXPLAIN PLANwith parallel replicas. #64298 (vdimir).
- Rename
allow_deprecated_functionsto
allow_deprecated_error_prone_window_functions. #64358 (Raúl Marín).
- Respect
max_read_buffer_sizesetting for file descriptors as well in the
filetable function. #64532 (Azat Khuzhin).
- Disable transactions for unsupported storages even for materialized views. #64918 (alesapin).
- Forbid
QUALIFYclause in the old analyzer. The old analyzer ignored
QUALIFY, so it could lead to unexpected data removal in mutations. #65356 (Dmitry Novik).
Bug Fix (user-visible misbehavior in an official stable release)
- A bug in Apache ORC library was fixed: Fixed ORC statistics calculation, when writing, for unsigned types on all platforms and Int8 on ARM. #64563 (Michael Kolupaev).
- Returned back the behaviour of how ClickHouse works and interprets Tuples in CSV format. This change effectively reverts https://github.com/ClickHouse/ClickHouse/pull/60994 and makes it available only under a few settings:
output_format_csv_serialize_tuple_into_separate_columns,
input_format_csv_deserialize_separate_columns_into_tupleand
input_format_csv_try_infer_strings_from_quoted_tuples. #65170 (Nikita Mikhaylov).
- Fix a permission error where a user in a specific situation can escalate their privileges on the default database without necessary grants. #64769 (pufit).
- Fix crash with UniqInjectiveFunctionsEliminationPass and uniqCombined. #65188 (Raúl Marín).
- Fix a bug in ClickHouse Keeper that causes digest mismatch during closing session. #65198 (Aleksei Filatov).
- Use correct memory alignment for Distinct combinator. Previously, crash could happen because of invalid memory allocation when the combinator was used. #65379 (Antonio Andelic).
- Fix crash with
DISTINCTand window functions. #64767 (Igor Nikonov).
- Fixed 'set' skip index not working with IN and indexHint(). #62083 (Michael Kolupaev).
- Support executing function during assignment of parameterized view value. #63502 (SmitaRKulkarni).
- Fixed parquet memory tracking. #63584 (Michael Kolupaev).
- Fixed reading of columns of type
Tuple(Map(LowCardinality(String), String), ...). #63956 (Anton Popov).
- Fix an
Cyclic aliaseserror for cyclic aliases of different type (expression and function). #63993 (Nikolai Kochetov).
- This fix will use a proper redefined context with the correct definer for each individual view in the query pipeline. #64079 (pufit).
- Fix analyzer: "Not found column" error is fixed when using INTERPOLATE. #64096 (Yakov Olkhovskiy).
- Fix creating backups to S3 buckets with different credentials from the disk containing the file. #64153 (Antonio Andelic).
- The query cache now considers two identical queries against different databases as different. The previous behavior could be used to bypass missing privileges to read from a table. #64199 (Robert Schulze).
- Fix possible abort on uncaught exception in ~WriteBufferFromFileDescriptor in StatusFile. #64206 (Kruglov Pavel).
- Fix
duplicate aliaserror for distributed queries with
ARRAY JOIN. #64226 (Nikolai Kochetov).
- Fix unexpected accurateCast from string to integer. #64255 (wudidapaopao).
- Fixed CNF simplification, in case any OR group contains mutually exclusive atoms. #64256 (Eduard Karacharov).
- Fix Query Tree size validation. #64377 (Dmitry Novik).
- Fix
Logical error: Bad castfor
Buffertable with
PREWHERE. #64388 (Nikolai Kochetov).
- Prevent recursive logging in
blob_storage_logwhen it's stored on object storage. #64393 (vdimir).
- Fixed
CREATE TABLE ASqueries for tables with default expressions. #64455 (Anton Popov).
- Fixed
optimize_read_in_orderbehaviour for ORDER BY ... NULLS FIRST / LAST on tables with nullable keys. #64483 (Eduard Karacharov).
- Fix the
Expression nodes list expected 1 projection namesand
Unknown expression or identifiererrors for queries with aliases to
GLOBAL IN.. #64517 (Nikolai Kochetov).
- Fix an error
Cannot find columnin distributed queries with constant CTE in the
GROUP BYkey. #64519 (Nikolai Kochetov).
- Fix the crash loop when restoring from backup is blocked by creating an MV with a definer that hasn't been restored yet. #64595 (pufit).
- Fix the output of function
formatDateTimeInJodaSyntaxwhen a formatter generates an uneven number of characters and the last character is
0. For example,
SELECT formatDateTimeInJodaSyntax(toDate('2012-05-29'), 'D')now correctly returns
150instead of previously
15. #64614 (LiuNeng).
- Do not rewrite aggregation if
-Ifcombinator is already used. #64638 (Dmitry Novik).
- Fix type inference for float (in case of small buffer, i.e.
--max_read_buffer_size 1). #64641 (Azat Khuzhin).
- Fix bug which could lead to non-working TTLs with expressions. #64694 (alesapin).
- Fix removing the
WHEREand
PREWHEREexpressions, which are always true (for the new analyzer). #64695 (Nikolai Kochetov).
- Fixed excessive part elimination by token-based text indexes (
ngrambf,
full_text) when filtering by result of
startsWith,
endsWith,
match,
multiSearchAny. #64720 (Eduard Karacharov).
- Fixes incorrect behaviour of ANSI CSI escaping in the
UTF8::computeWidthfunction. #64756 (Shaun Struwig).
- Fix a case of incorrect removal of
ORDER BY/
LIMIT BYacross subqueries. #64766 (Raúl Marín).
- Fix (experimental) unequal join with subqueries for sets which are in the mixed join conditions. #64775 (lgbo).
- Fix crash in a local cache over
plain_rewritabledisk. #64778 (Julia Kartseva).
- Keeper fix: return correct value for
zk_latest_snapshot_sizein
mntrcommand. #64784 (Antonio Andelic).
- Fix
Cannot find columnin distributed query with
ARRAY JOINby
Nestedcolumn. Fixes #64755. #64801 (Nikolai Kochetov).
- Fix memory leak in slru cache policy. #64803 (Kseniia Sumarokova).
- Fixed possible incorrect memory tracking in several kinds of queries: queries that read any data from S3, queries via http protocol, asynchronous inserts. #64844 (Anton Popov).
- Fix the
Block structure mismatcherror for queries reading with
PREWHEREfrom the materialized view when the materialized view has columns of different types than the source table. Fixes #64611. #64855 (Nikolai Kochetov).
- Fix rare crash when table has TTL with subquery + database replicated + parallel replicas + analyzer. It's really rare, but please don't use TTLs with subqueries. #64858 (alesapin).
- Fix duplicating
Deleteevents in
blob_storage_login case of large batch to delete. #64924 (vdimir).
- Fixed
Session moved to another servererror from [Zoo]Keeper that might happen after server startup when the config has includes from [Zoo]Keeper. #64986 (Alexander Tokmakov).
- Fix
ALTER MODIFY COMMENTquery that was broken for parameterized VIEWs in https://github.com/ClickHouse/ClickHouse/pull/54211. #65031 (Nikolay Degterinsky).
- Fix
host_idin DatabaseReplicated when
cluster_secure_connectionparameter is enabled. Previously all the connections within the cluster created by DatabaseReplicated were not secure, even if the parameter was enabled. #65054 (Nikolay Degterinsky).
- Fixing the
Not-ready Seterror after the
PREWHEREoptimization for StorageMerge. #65057 (Nikolai Kochetov).
- Avoid writing to finalized buffer in File-like storages. #65063 (Kruglov Pavel).
- Fix possible infinite query duration in case of cyclic aliases. Fixes #64849. #65081 (Nikolai Kochetov).
- Fix the
Unknown expression identifiererror for remote queries with
INTERPOLATE (alias)(new analyzer). Fixes #64636. #65090 (Nikolai Kochetov).
- Fix pushing arithmetic operations out of aggregation. In the new analyzer, optimization was applied only once. #65104 (Dmitry Novik).
- Fix aggregate function name rewriting in the new analyzer. #65110 (Dmitry Novik).
- Respond with 5xx instead of 200 OK in case of receive timeout while reading (parts of) the request body from the client socket. #65118 (Julian Maicher).
- Fix possible crash for hedged requests. #65206 (Azat Khuzhin).
- Fix the bug in Hashed and Hashed_Array dictionary short circuit evaluation, which may read uninitialized number, leading to various errors. #65256 (jsc0218).
- This PR ensures that the type of the constant(IN operator's second parameter) is always visible during the IN operator's type conversion process. Otherwise, losing type information may cause some conversions to fail, such as the conversion from DateTime to Date. This fixes (#64487). #65315 (pn).
Build/Testing/Packaging Improvement
- Add support for LLVM XRay. #64592 #64837 (Tomer Shafir).
- Unite s3/hdfs/azure storage implementations into a single class working with IObjectStorage. Same for *Cluster, data lakes and Queue storages. #59767 (Kseniia Sumarokova).
- Refactor data part writer to remove dependencies on MergeTreeData and DataPart. #63620 (Alexander Gololobov).
- Refactor
KeyConditionand key analysis to improve PartitionPruner and trivial count optimization. This is separated from #60463 . #61459 (Amos Bird).
- Introduce assertions to verify all functions are called with columns of the right size. #63723 (Raúl Marín).
- Make
networkservice be required when using the
rcinit script to start the ClickHouse server daemon. #60650 (Chun-Sheng, Li).
- Reduce the size of some slow tests. #64387 #64452 (Raúl Marín).
- Replay ZooKeeper logs using keeper-bench. #62481 (Antonio Andelic).
ClickHouse release 24.5, 2024-05-30
Backward Incompatible Change
- Renamed "inverted indexes" to "full-text indexes" which is a less technical / more user-friendly name. This also changes internal table metadata and breaks tables with existing (experimental) inverted indexes. Please make sure to drop such indexes before upgrade and re-create them after upgrade. #62884 (Robert Schulze).
- Usage of functions
neighbor,
runningAccumulate,
runningDifferenceStartingWithFirstValue,
runningDifferencedeprecated (because it is error-prone). Proper window functions should be used instead. To enable them back, set
allow_deprecated_error_prone_window_functions = 1or set
compatibility = '24.4'or lower. #63132 (Nikita Taranov).
- Queries from
system.columnswill work faster if there is a large number of columns, but many databases or tables are not granted for
SHOW TABLES. Note that in previous versions, if you grant
SHOW COLUMNSto individual columns without granting
SHOW TABLESto the corresponding tables, the
system.columnstable will show these columns, but in a new version, it will skip the table entirely. Remove trace log messages "Access granted" and "Access denied" that slowed down queries. #63439 (Alexey Milovidov).
New Feature
- Adds the
Formformat to read/write a single record in the
application/x-www-form-urlencodedformat. #60199 (Shaun Struwig).
- Added possibility to compress in CROSS JOIN. #60459 (p1rattttt).
- Added possibility to do
CROSS JOINin temporary files if the size exceeds limits. #63432 (p1rattttt).
- Support join with inequal conditions which involve columns from both left and right table. e.g.
t1.y < t2.y. To enable,
SET allow_experimental_join_condition = 1. #60920 (lgbo).
- Maps can now have
Float32,
Float64,
Array(T),
Map(K, V)and
Tuple(T1, T2, ...)as keys. Closes #54537. #59318 (李扬).
- Introduce bulk loading to
EmbeddedRocksDBby creating and ingesting SST file instead of relying on rocksdb build-in memtable. This help to increase importing speed, especially for long-running insert query to StorageEmbeddedRocksDB tables. Also, introduce
EmbeddedRocksDBtable settings. #59163 #63324 (Duc Canh Le).
- User can now parse CRLF with TSV format using a setting
input_format_tsv_crlf_end_of_line. Closes #56257. #59747 (Shaun Struwig).
- A new setting
input_format_force_null_for_omitted_fieldsthat forces NULL values for omitted fields. #60887 (Constantine Peresypkin).
- Earlier our S3 storage and s3 table function didn't support selecting from archive container files, such as tarballs, zip, 7z. Now they allow to iterate over files inside archives in S3. #62259 (Daniil Ivanik).
- Support for conditional function
clamp. #62377 (skyoct).
- Add
NPyoutput format. #62430 (豪肥肥).
Rawformat as a synonym for
TSVRaw. #63394 (Unalian).
- Added a new SQL function
generateUUIDv7to generate version 7 UUIDs aka. timestamp-based UUIDs with random component. Also added a new function
UUIDToNumto extract bytes from a UUID and a new function
UUIDv7ToDateTimeto extract timestamp component from a UUID version 7. #62852 (Alexey Petrunyaka).
- On Linux and MacOS, if the program has stdout redirected to a file with a compression extension, use the corresponding compression method instead of nothing (making it behave similarly to
INTO OUTFILE). #63662 (v01dXYZ).
- Change warning on high number of attached tables to differentiate tables, views and dictionaries. #64180 (Francisco J. Jurado Moreno).
- Provide support for
azureBlobStoragefunction in ClickHouse server to use Azure Workload identity to authenticate against Azure blob storage. If
use_workload_identityparameter is set in config, workload identity is used for authentication. #57881 (Vinay Suryadevara).
- Add TTL information in the
system.parts_columnstable. #63200 (litlig).
Experimental Features
- Implement
Dynamicdata type that allows to store values of any type inside it without knowing all of them in advance.
Dynamictype is available under a setting
allow_experimental_dynamic_type. Reference: #54864. #63058 (Kruglov Pavel).
- Allowed to create
MaterializedMySQLdatabase without connection to MySQL. #63397 (Kirill).
- Automatically mark a replica of Replicated database as lost and start recovery if some DDL task fails more than
max_retries_before_automatic_recovery(100 by default) times in a row with the same error. Also, fixed a bug that could cause skipping DDL entries when an exception is thrown during an early stage of entry execution. #63549 (Alexander Tokmakov).
- Account failed files in
s3queue_tracked_file_ttl_secand
s3queue_traked_files_limitfor
StorageS3Queue. #63638 (Kseniia Sumarokova).
Performance Improvement
- Less contention in filesystem cache (part 4). Allow to keep filesystem cache not filled to the limit by doing additional eviction in the background (controlled by
keep_free_space_size(elements)_ratio). This allows to release pressure from space reservation for queries (on
tryReservemethod). Also this is done in a lock free way as much as possible, e.g. should not block normal cache usage. #61250 (Kseniia Sumarokova).
- Skip merging of newly created projection blocks during
INSERT-s. #59405 (Nikita Taranov).
- Process string functions
...UTF8'asciily' if input strings are all ascii chars. Inspired by https://github.com/apache/doris/pull/29799. Overall speed up by 1.07x~1.62x. Notice that peak memory usage had been decreased in some cases. #61632 (李扬).
- Improved performance of selection (
{}) globs in StorageS3. #62120 (Andrey Zvonov).
- HostResolver has each IP address several times. If remote host has several IPs and by some reason (firewall rules for example) access on some IPs allowed and on others forbidden, than only first record of forbidden IPs marked as failed, and in each try these IPs have a chance to be chosen (and failed again). Even if fix this, every 120 seconds DNS cache dropped, and IPs can be chosen again. #62652 (Anton Ivashkin).
- Add a new configuration
prefer_merge_sort_block_bytesto control the memory usage and speed up sorting 2 times when merging when there are many columns. #62904 (LiuNeng).
clickhouse-localwill start faster. In previous versions, it was not deleting temporary directories by mistake. Now it will. This closes #62941. #63074 (Alexey Milovidov).
- Micro-optimizations for the new analyzer. #63429 (Raúl Marín).
- Index analysis will work if
DateTimeis compared to
DateTime64. This closes #63441. #63443 #63532 (Alexey Milovidov).
- Speed up indices of type
seta little (around 1.5 times) by removing garbage. #64098 (Alexey Milovidov).
- Remove copying data when writing to the filesystem cache. #63401 (Kseniia Sumarokova).
- Now backups with azure blob storage will use multicopy. #64116 (alesapin).
- Allow to use native copy for azure even with different containers. #64154 (alesapin).
- Finally enable native copy for azure. #64182 (alesapin).
Improvement
- Allow using
clickhouse-localand its shortcuts
clickhouseand
chwith a query or queries file as a positional argument. Examples:
ch "SELECT 1",
ch --param_test Hello "SELECT {test:String}",
ch query.sql. This closes #62361. #63081 (Alexey Milovidov).
- Enable plain_rewritable metadata for local and Azure (azure_blob_storage) object storages. #63365 (Julia Kartseva).
- Support English-style Unicode quotes, e.g. “Hello”, ‘world’. This is questionable in general but helpful when you type your query in a word processor, such as Google Docs. This closes #58634. #63381 (Alexey Milovidov).
- Allow trailing commas in the columns list in the INSERT query. For example,
INSERT INTO test (a, b, c, ) VALUES .... #63803 (Alexey Milovidov).
- Better exception messages for the
Regexpformat. #63804 (Alexey Milovidov).
- Allow trailing commas in the
Valuesformat. For example, this query is allowed:
INSERT INTO test (a, b, c) VALUES (4, 5, 6,);. #63810 (Alexey Milovidov).
- Make rabbitmq nack broken messages. Closes #45350. #60312 (Kseniia Sumarokova).
- Fix a crash in asynchronous stack unwinding (such as when using the sampling query profiler) while interpreting debug info. This closes #60460. #60468 (Alexey Milovidov).
- Distinct messages for s3 error 'no key' for cases disk and storage. #61108 (Sema Checherinda).
- The progress bar will work for trivial queries with LIMIT from
system.zeros,
system.zeros_mt(it already works for
system.numbersand
system.numbers_mt), and the
generateRandomtable function. As a bonus, if the total number of records is greater than the
max_rows_to_readlimit, it will throw an exception earlier. This closes #58183. #61823 (Alexey Milovidov).
- Support for "Merge Key" in YAML configurations (this is a weird feature of YAML, please never mind). #62685 (Azat Khuzhin).
- Enhance error message when non-deterministic function is used with Replicated source. #62896 (Grégoire Pineau).
- Fix interserver secret for Distributed over Distributed from
remote. #63013 (Azat Khuzhin).
- Support
include_fromfor YAML files. However, you should better use
config.d#63106 (Eduard Karacharov).
- Keep previous data in terminal after picking from skim suggestions. #63261 (FlameFactory).
- Width of fields (in Pretty formats or the
visibleWidthfunction) now correctly ignores ANSI escape sequences. #63270 (Shaun Struwig).
- Update the usage of error code
NUMBER_OF_ARGUMENTS_DOESNT_MATCHby more accurate error codes when appropriate. #63406 (Yohann Jardin).
os_userand
client_hostnameare now correctly set up for queries for command line suggestions in clickhouse-client. This closes #63430. #63433 (Alexey Milovidov).
- Automatically correct
max_block_sizeto the default value if it is zero. #63587 (Antonio Andelic).
- Add a build_id ALIAS column to trace_log to facilitate auto renaming upon detecting binary changes. This is to address #52086. #63656 (Zimu Li).
- Enable truncate operation for object storage disks. #63693 (MikhailBurdukov).
- The loading of the keywords list is now dependent on the server revision and will be disabled for the old versions of ClickHouse server. CC @azat. #63786 (Nikita Mikhaylov).
- Clickhouse disks have to read server setting to obtain actual metadata format version. #63831 (Sema Checherinda).
- Disable pretty format restrictions (
output_format_pretty_max_rows/
output_format_pretty_max_value_width) when stdout is not TTY. #63942 (Azat Khuzhin).
- Exception handling now works when ClickHouse is used inside AWS Lambda. Author: Alexey Coolnev. #64014 (Alexey Milovidov).
- Throw
CANNOT_DECOMPRESSinstread of
CORRUPTED_DATAon invalid compressed data passed via HTTP. #64036 (vdimir).
- A tip for a single large number in Pretty formats now works for Nullable and LowCardinality. This closes #61993. #64084 (Alexey Milovidov).
- Add metrics, logs, and thread names around parts filtering with indices. #64130 (Alexey Milovidov).
- Ignore
allow_suspicious_primary_keyon
ATTACHand verify on
ALTER. #64202 (Azat Khuzhin).
Build/Testing/Packaging Improvement
- ClickHouse is built with clang-18. A lot of new checks from clang-tidy-18 have been enabled. #60469 (Alexey Milovidov).
- Experimentally support loongarch64 as a new platform for ClickHouse. #63733 (qiangxuhui).
- The Dockerfile is reviewed by the docker official library in https://github.com/docker-library/official-images/pull/15846. #63400 (Mikhail f. Shiryaev).
- Information about every symbol in every translation unit will be collected in the CI database for every build in the CI. This closes #63494. #63495 (Alexey Milovidov).
- Update Apache Datasketches library. It resolves #63858. #63923 (Alexey Milovidov).
- Enable GRPC support for aarch64 linux while cross-compiling binary. #64072 (alesapin).
- Fix unwind on SIGSEGV on aarch64 (due to small stack for signal) #64058 (Azat Khuzhin).
Bug Fix
- Disabled
enable_vertical_finalsetting by default. This feature should not be used because it has a bug: #64543. #64544 (Alexander Tokmakov).
- Fix making backup when multiple shards are used #57684 (Vitaly Baranov).
- Fix passing projections/indexes/primary key from columns list from CREATE query into inner table of MV #59183 (Azat Khuzhin).
- Fix boundRatio incorrect merge #60532 (Tao Wang).
- Fix crash when calling some functions on const low-cardinality columns #61966 (Michael Kolupaev).
- Fix queries with FINAL give wrong result when table does not use adaptive granularity #62432 (Duc Canh Le).
- Improve detection of cgroups v2 support for memory controllers #62903 (Robert Schulze).
- Fix subsequent use of external tables in client #62964 (Azat Khuzhin).
- Fix crash with untuple and unresolved lambda #63131 (Raúl Marín).
- Fix premature server listen for connections #63181 (alesapin).
- Fix intersecting parts when restarting after a DROP PART command #63202 (Han Fei).
- Correctly load SQL security defaults during startup #63209 (pufit).
- JOIN filter push down filter join fix #63234 (Maksim Kita).
- Fix infinite loop in AzureObjectStorage::listObjects #63257 (Julia Kartseva).
- CROSS join ignore join_algorithm setting #63273 (vdimir).
- Fix finalize WriteBufferToFileSegment and StatusFile #63346 (vdimir).
- Fix logical error during SELECT query after ALTER in rare case #63353 (alesapin).
- Fix
X-ClickHouse-Timezoneheader with
session_timezone#63377 (Andrey Zvonov).
- Fix debug assert when using grouping WITH ROLLUP and LowCardinality types #63398 (Raúl Marín).
- Small fixes for group_by_use_nulls #63405 (vdimir).
- Fix backup/restore of projection part in case projection was removed from table metadata, but part still has projection #63426 (Kseniia Sumarokova).
- Fix mysql dictionary source #63481 (vdimir).
- Insert QueryFinish on AsyncInsertFlush with no data #63483 (Raúl Marín).
- Fix: empty used_dictionaries in system.query_log #63487 (Eduard Karacharov).
- Make
MergeTreePrefetchedReadPoolsafer #63513 (Antonio Andelic).
- Fix crash on exit with sentry enabled (due to openssl destroyed before sentry) #63548 (Azat Khuzhin).
- Fix Array and Map support with Keyed hashing #63628 (Salvatore Mesoraca).
- Fix filter pushdown for Parquet and maybe StorageMerge #63642 (Michael Kolupaev).
- Prevent conversion to Replicated if zookeeper path already exists #63670 (Kirill).
- Analyzer: views read only necessary columns #63688 (Maksim Kita).
- Analyzer: Forbid WINDOW redefinition #63694 (Dmitry Novik).
- flatten_nested was broken with the experimental Replicated database. #63695 (Nikolai Kochetov).
- Fix #63653 #63722 (Nikolai Kochetov).
- Allow cast from Array(Nothing) to Map(Nothing, Nothing) #63753 (Nikolai Kochetov).
- Fix ILLEGAL_COLUMN in partial_merge join #63755 (vdimir).
- Fix: remove redundant distinct with window functions #63776 (Igor Nikonov).
- Fix possible crash with SYSTEM UNLOAD PRIMARY KEY #63778 (Raúl Marín).
- Fix a query with duplicating cycling alias. #63791 (Nikolai Kochetov).
- Make
TokenIteratorlazy as it should be #63801 (Alexey Milovidov).
- Add
endpoint_subpathS3 URI setting #63806 (Julia Kartseva).
- Fix deadlock in
ParallelReadBuffer#63814 (Antonio Andelic).
- JOIN filter push down equivalent columns fix #63819 (Maksim Kita).
- Remove data from all disks after DROP with Lazy database. #63848 (MikhailBurdukov).
- Fix incorrect result when reading from MV with parallel replicas and new analyzer #63861 (Nikita Taranov).
- Fixes in
find_super_nodesand
find_big_familycommand of keeper-client #63862 (Alexander Gololobov).
- Update lambda execution name #63864 (Nikolai Kochetov).
- Fix SIGSEGV due to CPU/Real profiler #63865 (Azat Khuzhin).
- Fix
EXPLAIN CURRENT TRANSACTIONquery #63926 (Anton Popov).
- Fix analyzer: there's turtles all the way down... #63930 (Yakov Olkhovskiy).
- Allow certain ALTER TABLE commands for
plain_rewritabledisk #63933 (Julia Kartseva).
- Recursive CTE distributed fix #63939 (Maksim Kita).
- Analyzer: Fix COLUMNS resolve #63962 (Dmitry Novik).
- LIMIT BY and skip_unused_shards with analyzer #63983 (Nikolai Kochetov).
- A fix for some trash (experimental Kusto) #63992 (Yong Wang).
- Deserialize untrusted binary inputs in a safer way #64024 (Robert Schulze).
- Fix query analysis for queries with the setting
final= 1 for Distributed tables over tables from other than the MergeTree family. #64037 (Nikolai Kochetov).
- Add missing settings to recoverLostReplica #64040 (Raúl Marín).
- Fix SQL security access checks with analyzer #64079 (pufit).
- Fix analyzer: only interpolate expression should be used for DAG #64096 (Yakov Olkhovskiy).
- Fix azure backup writing multipart blocks by 1 MiB (read buffer size) instead of
max_upload_part_size(in non-native copy case) #64117 (Kseniia Sumarokova).
- Correctly fallback during backup copy #64153 (Antonio Andelic).
- Prevent LOGICAL_ERROR on CREATE TABLE as Materialized View #64174 (Raúl Marín).
- Query Cache: Consider identical queries against different databases as different #64199 (Robert Schulze).
- Ignore
text_logfor Keeper #64218 (Antonio Andelic).
- Fix Logical error: Bad cast for Buffer table with prewhere. #64388 (Nikolai Kochetov).
ClickHouse release 24.4, 2024-04-30
Upgrade Notes
clickhouse-odbc-bridgeand
clickhouse-library-bridgeare now separate packages. This closes #61677. #62114 (Alexey Milovidov).
- Don't allow to set max_parallel_replicas (for the experimental parallel reading from replicas) to
0as it doesn't make sense. Closes #60140. #61201 (Kruglov Pavel).
- Remove support for
INSERT WATCHquery (part of the deprecated
LIVE VIEWfeature). #62382 (Alexey Milovidov).
- Removed the
optimize_monotonous_functions_in_order_bysetting. #63004 (Raúl Marín).
- Remove experimental tag from the
Replicateddatabase engine. Now it is in Beta stage. #62937 (Justin de Guzman).
New Feature
- Support recursive CTEs. #62074 (Maksim Kita).
- Support
QUALIFYclause. Closes #47819. #62619 (Maksim Kita).
- Table engines are grantable now, and it won't affect existing users behavior. #60117 (jsc0218).
- Added a rewritable S3 disk which supports INSERT operations and does not require locally stored metadata. #61116 (Julia Kartseva). The main use case is for system tables.
- The syntax highlighting while typing in the client will work on the syntax level (previously, it worked on the lexer level). #62123 (Alexey Milovidov).
- Supports dropping multiple tables at the same time like
DROP TABLE a, b, c;. #58705 (zhongyuankai).
- Modifying memory table settings through
ALTER MODIFY SETTINGis now supported. Example:
ALTER TABLE memory MODIFY SETTING min_rows_to_keep = 100, max_rows_to_keep = 1000;. #62039 (zhongyuankai).
- Added
rolequery parameter to the HTTP interface. It works similarly to
SET ROLE x, applying the role before the statement is executed. This allows for overcoming the limitation of the HTTP interface, as multiple statements are not allowed, and it is not possible to send both
SET ROLE xand the statement itself at the same time. It is possible to set multiple roles that way, e.g.,
?role=x&role=y, which will be an equivalent of
SET ROLE x, y. #62669 (Serge Klochkov).
- Add
SYSTEM UNLOAD PRIMARY KEYto free up memory usage for a table's primary key. #62738 (Pablo Marcos).
- Added
value1,
value2, ...,
value10columns to
system.text_log. These columns contain values that were used to format the message. #59619 (Alexey Katsman).
- Added persistent virtual column
_block_offsetwhich stores original number of row in block that was assigned at insert. Persistence of column
_block_offsetcan be enabled by the MergeTree setting
enable_block_offset_column. Added virtual column
_part_data_versionwhich contains either min block number or mutation version of part. Persistent virtual column
_block_numberis not considered experimental anymore. #60676 (Anton Popov).
- Add a setting
input_format_json_throw_on_bad_escape_sequence, disabling it allows saving bad escape sequences in JSON input formats. #61889 (Kruglov Pavel).
Performance Improvement
- JOIN filter push down improvements using equivalent sets. #61216 (Maksim Kita).
- Convert OUTER JOIN to INNER JOIN optimization if the filter after JOIN always filters default values. Optimization can be controlled with setting
query_plan_convert_outer_join_to_inner_join, enabled by default. #62907 (Maksim Kita).
- Improvement for AWS S3. Client has to send header 'Keep-Alive: timeout=X' to the server. If a client receives a response from the server with that header, client has to use the value from the server. Also for a client it is better not to use a connection which is nearly expired in order to avoid connection close race. #62249 (Sema Checherinda).
- Reduce overhead of the mutations for SELECTs (v2). #60856 (Azat Khuzhin).
- More frequently invoked functions in PODArray are now force-inlined. #61144 (李扬).
- Speed up parsing of JSON by skipping the rest of the object when all required columns are read. #62210 (lgbo).
- Improve trivial insert select from files in file/s3/hdfs/url/... table functions. Add separate max_parsing_threads setting to control the number of threads used in parallel parsing. #62404 (Kruglov Pavel).
- Functions
to_utc_timestampand
from_utc_timestampare now about 2x faster. #62583 (KevinyhZou).
- Functions
parseDateTimeOrNull,
parseDateTimeOrZero,
parseDateTimeInJodaSyntaxOrNulland
parseDateTimeInJodaSyntaxOrZeronow run significantly faster (10x - 1000x) when the input contains mostly non-parseable values. #62634 (LiuNeng).
- SELECTs against
system.query_cacheare now noticeably faster when the query cache contains lots of entries (e.g. more than 100.000). #62671 (Robert Schulze).
- Less contention in filesystem cache (part 3): execute removal from filesystem without lock on space reservation attempt. #61163 (Kseniia Sumarokova).
- Speed up dynamic resize of filesystem cache. #61723 (Kseniia Sumarokova).
- Dictionary source with
INVALIDATE_QUERYis not reloaded twice on startup. #62050 (vdimir).
- Fix an issue where when a redundant
= 1or
= 0is added after a boolean expression involving the primary key, the primary index is not used. For example, both
SELECT * FROM <table> WHERE <primary-key> IN (<value>) = 1and
SELECT * FROM <table> WHERE <primary-key> NOT IN (<value>) = 0will both perform a full table scan, when the primary index can be used. #62142 (josh-hildred).
- Return stream of chunks from
system.remote_data_pathsinstead of accumulating the whole result in one big chunk. This allows to consume less memory, show intermediate progress and cancel the query. #62613 (Alexander Gololobov).
Experimental Feature
- Support parallel write buffer for Azure Blob Storage managed by setting
azure_allow_parallel_part_upload. #62534 (SmitaRKulkarni).
- Userspace page cache works with static web storage (
disk(type = web)) now. Use client setting
use_page_cache_for_disks_without_file_cache=1to enable. #61911 (Michael Kolupaev).
- Don't treat Bool and number variants as suspicious in the
Varianttype. #61999 (Kruglov Pavel).
- Implement better conversion from String to
Variantusing parsing. #62005 (Kruglov Pavel).
- Support
Variantin JSONExtract functions. #62014 (Kruglov Pavel).
- Mark type
Variantas comparable so it can be used in primary key. #62693 (Kruglov Pavel).
Improvement
- For convenience purpose,
SELECT * FROM numbers()will work in the same way as
SELECT * FROM system.numbers- without a limit. #61969 (YenchangChan).
- Introduce separate consumer/producer tags for the Kafka configuration. This avoids warnings from librdkafka (a bad C library with a lot of bugs) that consumer properties were specified for producer instances and vice versa (e.g.
Configuration property session.timeout.ms is a consumer property and will be ignored by this producer instance). Closes: #58983. #58956 (Aleksandr Musorin).
- Functions
date_diffand
agenow calculate their result at nanosecond instead of microsecond precision. They now also offer
nanosecond(or
nanosecondsor
ns) as a possible value for the
unitparameter. #61409 (Austin Kothig).
- Added nano-, micro-, milliseconds unit for
date_trunc. #62335 (Misz606).
- Reload certificate chain during certificate reload. #61671 (Pervakov Grigorii).
- Try to prevent an error #60432 by not allowing a table to be attached if there is an active replica for that replica path. #61876 (Arthur Passos).
- Implement support for
inputfor
clickhouse-local. #61923 (Azat Khuzhin).
Jointable engine with strictness
ANYis consistent after reload. When several rows with the same key are inserted, the first one will have higher priority (before, it was chosen randomly upon table loading). close #51027. #61972 (vdimir).
- Automatically infer Nullable column types from Apache Arrow schema. #61984 (Maksim Kita).
- Allow to cancel parallel merge of aggregate states during aggregation. Example:
uniqExact. #61992 (Maksim Kita).
- Use
system.keywordsto fill in the suggestions and also use them in the all places internally. #62000 (Nikita Mikhaylov).
OPTIMIZE FINALfor
ReplicatedMergeTreenow will wait for currently active merges to finish and then reattempt to schedule a final merge. This will put it more in line with ordinary
MergeTreebehaviour. #62067 (Nikita Taranov).
- While read data from a hive text file, it would use the first line of hive text file to resize of number of input fields, and sometimes the fields number of first line is not matched with the hive table defined , such as the hive table is defined to have 3 columns, like
test_tbl(a Int32, b Int32, c Int32), but the first line of text file only has 2 fields, and in this situation, the input fields will be resized to 2, and if the next line of the text file has 3 fields, then the third field can not be read but set a default value 0, which is not right. #62086 (KevinyhZou).
CREATE AScopies the table's comment. #62117 (Pablo Marcos).
- Add query progress to table zookeeper. #62152 (JackyWoo).
- Add ability to turn on trace collector (Real and CPU) server-wide. #62189 (alesapin).
- Added setting
lightweight_deletes_sync(default value: 2 - wait all replicas synchronously). It is similar to setting
mutations_syncbut affects only behaviour of lightweight deletes. #62195 (Anton Popov).
- Distinguish booleans and integers while parsing values for custom settings:
SET custom_a = true; SET custom_b = 1;. #62206 (Vitaly Baranov).
- Support S3 access through AWS Private Link Interface endpoints. Closes #60021, #31074 and #53761. #62208 (Arthur Passos).
- Do not create a directory for UDF in clickhouse-client if it does not exist. This closes #59597. #62366 (Alexey Milovidov).
- The query cache now no longer caches results of queries against system tables (
system.*,
information_schema.*,
INFORMATION_SCHEMA.*). #62376 (Robert Schulze).
MOVE PARTITION TO TABLEquery can be delayed or can throw
TOO_MANY_PARTSexception to avoid exceeding limits on the part count. The same settings and limits are applied as for the
INSERTquery (see
max_parts_in_total,
parts_to_delay_insert,
parts_to_throw_insert,
inactive_parts_to_throw_insert,
inactive_parts_to_delay_insert,
max_avg_part_size_for_too_many_parts,
min_delay_to_insert_msand
max_delay_to_insertsettings). #62420 (Sergei Trifonov).
- Changed the default installation directory on macOS from
/usr/binto
/usr/local/bin. This is necessary because Apple's System Integrity Protection introduced with macOS El Capitan (2015) prevents writing into
/usr/bin, even with
sudo. #62489 (haohang).
- Make transform always return the first match. #62518 (Raúl Marín).
- Added the missing
hostnamecolumn to system table
blob_storage_log. #62456 (Jayme Bird).
- For consistency with other system tables,
system.backup_lognow has a column
event_time. #62541 (Jayme Bird).
- Table
system.backup_lognow has the "default" sorting key which is
event_date, event_time, the same as for other
_logtable engines. #62667 (Nikita Mikhaylov).
- Avoid evaluating table DEFAULT expressions while executing
RESTORE. #62601 (Vitaly Baranov).
- S3 storage and backups also need the same default keep alive settings as s3 disk. #62648 (Sema Checherinda).
- Add librdkafka's (that infamous C library, which has a lot of bugs) client identifier to log messages to be able to differentiate log messages from different consumers of a single table. #62813 (János Benjamin Antal).
- Allow special macros
{uuid}and
{database}in a Replicated database ZooKeeper path. #62818 (Vitaly Baranov).
- Allow quota key with different auth scheme in HTTP requests. #62842 (Kseniia Sumarokova).
- Reduce the verbosity of command line argument
--helpin
clickhouse clientand
clickhouse local. The previous output is now generated by
--help --verbose. #62973 (Yarik Briukhovetskyi).
log_bin_use_v1_row_eventswas removed in MySQL 8.3, and we adjust the experimental
MaterializedMySQLengine for it #60479. #63101 (Eugene Klimov). Author: Nikolay Yankin.
Build/Testing/Packaging Improvement
- Vendor in Rust dependencies, so the Rust code (that we use for minor features for hype and lulz) can be built in a sane way, similarly to C++. #62297 (Raúl Marín).
- ClickHouse now uses OpenSSL 3.2 instead of BoringSSL. #59870 (Robert Schulze). Note that OpenSSL has generally worse engineering culture (such as non-zero number of sanitizer reports, that we had to patch, a complex build system with generated files, etc.) but has better compatibility.
- Ignore DROP queries in stress test with 1/2 probability, use TRUNCATE instead of ignoring DROP in upgrade check for Memory/JOIN tables. #61476 (Kruglov Pavel).
- Remove from the Keeper Docker image the volumes at /etc/clickhouse-keeper and /var/log/clickhouse-keeper. #61683 (Tristan).
- Add tests for all issues which are no longer relevant with Analyzer being enabled by default. Closes: #55794 Closes: #49472 Closes: #44414 Closes: #13843 Closes: #55803 Closes: #48308 Closes: #45535 Closes: #44365 Closes: #44153 Closes: #42399 Closes: #27115 Closes: #23162 Closes: #15395 Closes: #15411 Closes: #14978 Closes: #17319 Closes: #11813 Closes: #13210 Closes: #23053 Closes: #37729 Closes: #32639 Closes: #9954 Closes: #41964 Closes: #54317 Closes: #7520 Closes: #36973 Closes: #40955 Closes: #19687 Closes: #23104 Closes: #21584 Closes: #23344 Closes: #22627 Closes: #10276 Closes: #19687 Closes: #4567 Closes: #17710 Closes: #11068 Closes: #24395 Closes: #23416 Closes: #23162 Closes: #25655 Closes: #11757 Closes: #6571 Closes: #4432 Closes: #8259 Closes: #9233 Closes: #14699 Closes: #27068 Closes: #28687 Closes: #28777 Closes: #29734 Closes: #61238 Closes: #33825 Closes: #35608 Closes: #29838 Closes: #35652 Closes: #36189 Closes: #39634 Closes: #47432 Closes: #54910 Closes: #57321 Closes: #59154 Closes: #61014 Closes: #61950 Closes: #55647 Closes: #61947. #62185 (Nikita Mikhaylov).
- Add more tests from issues which are no longer relevant or fixed by analyzer. Closes: #58985 Closes: #59549 Closes: #36963 Closes: #39453 Closes: #56521 Closes: #47552 Closes: #56503 Closes: #59101 Closes: #50271 Closes: #54954 Closes: #56466 Closes: #11000 Closes: #10894 Closes: https://github.com/ClickHouse/ClickHouse/issues/448 Closes: #8030 Closes: #32139 Closes: #47288 Closes: #50705 Closes: #54511 Closes: #55466 Closes: #58500 Closes: #39923 Closes: #39855 Closes: #4596 Closes: #47422 Closes: #33000 Closes: #14739 Closes: #44039 Closes: #8547 Closes: #22923 Closes: #23865 Closes: #29748 Closes: #4222. #62457 (Nikita Mikhaylov).
- Fixed build errors when OpenSSL is linked dynamically (note: this is generally unsupported and only required for IBM's s390x platforms). #62888 (Harry Lee).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix logical-error when undoing quorum insert transaction. #61953 (Han Fei).
- Fix parser error when using COUNT(*) with FILTER clause #61357 (Duc Canh Le).
- Fix logical error in
group_by_use_nulls+ grouping sets + analyzer + materialize/constant #61567 (Kruglov Pavel).
- Cancel merges before removing moved parts #61610 (János Benjamin Antal).
- Fix abort in Apache Arrow #61720 (Kruglov Pavel).
- Search for
convert_to_replicatedflag at the correct path corresponding to the specific disk #61769 (Kirill).
- Fix possible connections data-race for distributed_foreground_insert/distributed_background_insert_batch #61867 (Azat Khuzhin).
- Mark CANNOT_PARSE_ESCAPE_SEQUENCE error as parse error to be able to skip it in row input formats #61883 (Kruglov Pavel).
- Fix writing exception message in output format in HTTP when http_wait_end_of_query is used #61951 (Kruglov Pavel).
- Proper fix for LowCardinality together with JSONExtact functions #61957 (Nikita Mikhaylov).
- Crash in Engine Merge if Row Policy does not have expression #61971 (Ilya Golshtein).
- Fix WriteBufferAzureBlobStorage destructor uncaught exception #61988 (SmitaRKulkarni).
- Fix CREATE TABLE without columns definition for ReplicatedMergeTree #62040 (Azat Khuzhin).
- Fix optimize_skip_unused_shards_rewrite_in for composite sharding key #62047 (Azat Khuzhin).
- ReadWriteBufferFromHTTP set right header host when redirected #62068 (Sema Checherinda).
- Fix external table cannot parse data type Bool #62115 (Duc Canh Le).
- Analyzer: Fix query parameter resolution #62186 (Dmitry Novik).
- Fix restoring parts while readonly #62207 (Vitaly Baranov).
- Fix crash in index definition containing SQL UDF #62225 (vdimir).
- Fixing NULL random seed for generateRandom with analyzer. #62248 (Nikolai Kochetov).
- Correctly handle const columns in Distinct Transfom #62250 (Antonio Andelic).
- Fix Parts Splitter for queries with the FINAL modifier #62268 (Nikita Taranov).
- Analyzer: Fix alias to parametrized view resolution #62274 (Dmitry Novik).
- Analyzer: Fix name resolution from parent scopes #62281 (Dmitry Novik).
- Fix argMax with nullable non native numeric column #62285 (Raúl Marín).
- Fix BACKUP and RESTORE of a materialized view in Ordinary database #62295 (Vitaly Baranov).
- Fix data race on scalars in Context #62305 (Kruglov Pavel).
- Fix primary key in materialized view #62319 (Murat Khairulin).
- Do not build multithread insert pipeline for tables without support #62333 (vdimir).
- Fix analyzer with positional arguments in distributed query #62362 (flynn).
- Fix filter pushdown from additional_table_filters in Merge engine in analyzer #62398 (Kruglov Pavel).
- Fix GLOBAL IN table queries with analyzer. #62409 (Nikolai Kochetov).
- Respect settings truncate_on_insert/create_new_file_on_insert in s3/hdfs/azure engines during partitioned write #62425 (Kruglov Pavel).
- Fix backup restore path for AzureBlobStorage #62447 (SmitaRKulkarni).
- Fix SimpleSquashingChunksTransform #62451 (Nikita Taranov).
- Fix capture of nested lambda. #62462 (Nikolai Kochetov).
- Avoid crash when reading protobuf with recursive types #62506 (Raúl Marín).
- Fix a bug moving one partition from one to itself #62524 (helifu).
- Fix scalar subquery in LIMIT #62567 (Nikolai Kochetov).
- Fix segfault in the experimental and unsupported Hive engine, which we don't like anyway #62578 (Nikolay Degterinsky).
- Fix memory leak in groupArraySorted #62597 (Antonio Andelic).
- Fix crash in largestTriangleThreeBuckets #62646 (Raúl Marín).
- Fix tumble[Start,End] and hop[Start,End] for bigger resolutions #62705 (Jordi Villar).
- Fix argMin/argMax combinator state #62708 (Raúl Marín).
- Fix temporary data in cache failing because of cache lock contention optimization #62715 (Kseniia Sumarokova).
- Fix crash in function
mergeTreeIndex#62762 (Anton Popov).
- fix: update: nested materialized columns: size check fixes #62773 (Eliot Hautefeuille).
- Fix FINAL modifier is not respected in CTE with analyzer #62811 (Duc Canh Le).
- Fix crash in function
formatRowwith
JSONformat and HTTP interface #62840 (Anton Popov).
- Azure: fix building final url from endpoint object #62850 (Daniel Pozo Escalona).
- Fix GCD codec #62853 (Nikita Taranov).
- Fix LowCardinality(Nullable) key in hyperrectangle #62866 (Amos Bird).
- Fix fromUnixtimestamp in joda syntax while the input value beyond UInt32 #62901 (KevinyhZou).
- Disable optimize_rewrite_aggregate_function_with_if for sum(nullable) #62912 (Raúl Marín).
- Fix PREWHERE for StorageBuffer with different source table column types. #62916 (Nikolai Kochetov).
- Fix temporary data in cache incorrectly processing failure of cache key directory creation #62925 (Kseniia Sumarokova).
- gRPC: fix crash on IPv6 peer connection #62978 (Konstantin Bogdanov).
- Fix possible CHECKSUM_DOESNT_MATCH (and others) during replicated fetches #62987 (Azat Khuzhin).
- Fix terminate with uncaught exception in temporary data in cache #62998 (Kseniia Sumarokova).
- Fix optimize_rewrite_aggregate_function_with_if implicit cast #62999 (Raúl Marín).
- Fix unhandled exception in ~RestorerFromBackup #63040 (Vitaly Baranov).
- Do not remove server constants from GROUP BY key for secondary query. #63047 (Nikolai Kochetov).
- Fix incorrect judgement of of monotonicity of function abs #63097 (Duc Canh Le).
- Set server name for SSL handshake in MongoDB engine #63122 (Alexander Gololobov).
- Use user specified db instead of "config" for MongoDB wire protocol version check #63126 (Alexander Gololobov).
ClickHouse release 24.3 LTS, 2024-03-27
Upgrade Notes
- The setting
allow_experimental_analyzeris enabled by default and it switches the query analysis to a new implementation, which has better compatibility and feature completeness. The feature "analyzer" is considered beta instead of experimental. You can turn the old behavior by setting the
compatibilityto
24.2or disabling the
allow_experimental_analyzersetting. Watch the video on YouTube.
- ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings,
output_format_parquet_string_as_string,
output_format_orc_string_as_string,
output_format_arrow_string_as_string. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster
lz4compression method, that's why we set
zstdby default. This is controlled by the settings
output_format_parquet_compression_method,
output_format_orc_compression_method, and
output_format_arrow_compression_method. We changed the default to
zstdfor Parquet and ORC, but not Arrow (it is emphasized for low-level usages). #61817 (Alexey Milovidov).
- In the new ClickHouse version, the functions
geoDistance,
greatCircleDistance, and
greatCircleAnglewill use 64-bit double precision floating point data type for internal calculations and return type if all the arguments are Float64. This closes #58476. In previous versions, the function always used Float32. You can switch to the old behavior by setting
geo_distance_returns_float64_on_float64_argumentsto
falseor setting
compatibilityto
24.2or earlier. #61848 (Alexey Milovidov). Co-authored with Geet Patel.
- The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of #55186 and #45409. It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query:
SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type. To disable the usage of in-memory data parts, do
ALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. #61127 (Alexey Milovidov).
- Changed the column name from
duration_msto
duration_microsecondsin the
system.zookeepertable to reflect the reality that the duration is in the microsecond resolution. #60774 (Duc Canh Le).
- Reject incoming INSERT queries in case when query-level settings
async_insertand
deduplicate_blocks_in_dependent_materialized_viewsare enabled together. This behaviour is controlled by a setting
throw_if_deduplication_in_dependent_materialized_views_enabled_with_async_insertand enabled by default. This is a continuation of https://github.com/ClickHouse/ClickHouse/pull/59699 needed to unblock https://github.com/ClickHouse/ClickHouse/pull/59915. #60888 (Nikita Mikhaylov).
- Utility
clickhouse-copieris moved to a separate repository on GitHub: https://github.com/ClickHouse/copier. It is no longer included in the bundle but is still available as a separate download. This closes: #60734 This closes: #60540 This closes: #60250 This closes: #52917 This closes: #51140 This closes: #47517 This closes: #47189 This closes: #46598 This closes: #40257 This closes: #36504 This closes: #35485 This closes: #33702 This closes: #26702.
- To increase compatibility with MySQL, the compatibility alias
locatenow accepts arguments
(needle, haystack[, start_pos])by default. The previous behavior
(haystack, needle, [, start_pos])can be restored by setting
function_locate_has_mysql_compatible_argument_order = 0. #61092 (Robert Schulze).
- Forbid
SimpleAggregateFunctionin
ORDER BYof
MergeTreetables (like
AggregateFunctionis forbidden, but they are forbidden because they are not comparable) by default (use
allow_suspicious_primary_keyto allow them). #61399 (Azat Khuzhin).
- The
Ordinarydatabase engine is deprecated. You will receive a warning in clickhouse-client if your server is using it. This closes #52229. #56942 (shabroo).
New Feature
- Support reading and writing backups as
tar(in addition to
zip). #59535 (josh-hildred).
- Implemented support for S3 Express buckets. #59965 (Nikita Taranov).
- Allow to attach parts from a different disk (using copy instead of hard link). #60112 (Unalian).
- Size-capped
Memorytables: controlled by their settings,
min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keepand
max_rows_to_keep. #60612 (Jake Bamrah).
- Separate limits on number of waiting and executing queries. Added new server setting
max_waiting_queriesthat limits the number of queries waiting due to
async_load_databases. Existing limits on number of executing queries no longer count waiting queries. #61053 (Sergei Trifonov).
- Added a table
system.keywordswhich contains all the keywords from parser. Mostly needed and will be used for better fuzzing and syntax highlighting. #51808 (Nikita Mikhaylov).
- Add support for
ATTACH PARTITION ALL. #61107 (Kirill Nikiforov).
- Add a new function,
getClientHTTPHeader. This closes #54665. Co-authored with @lingtaolf. #61820 (Alexey Milovidov).
- Add
generate_seriesas a table function (compatibility alias for PostgreSQL to the existing
numbersfunction). This function generates table with an arithmetic progression with natural numbers. #59390 (divanik).
- A mode for
topK/
topkWeighedsupport mode, which return count of values and its error. #54508 (UnamedRus).
- Added function
toMillisecondwhich returns the millisecond component for values of type
DateTimeor
DateTime64. #60281 (Shaun Struwig).
- Allow configuring HTTP redirect handlers for clickhouse-server. For example, you can make
/redirect to the Play UI. #60390 (Alexey Milovidov).
Performance Improvement
- Optimized function
dotProductto omit unnecessary and expensive memory copies. #60928 (Robert Schulze).
- 30x faster printing for 256-bit integers. #61100 (Raúl Marín).
- If the table's primary key contains mostly useless columns, don't keep them in memory. This is controlled by a new setting
primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columnswith the value
0.9by default, which means: for a composite primary key, if a column changes its value for at least 0.9 of all the times, the next columns after it will be not loaded. #60255 (Alexey Milovidov).
- Improve the performance of serialized aggregation methods when involving multiple
Nullablecolumns. #55809 (Amos Bird).
- Lazy builds JSON's output to improve performance of ALL JOIN. #58278 (LiuNeng).
- Make HTTP/HTTPS connections with external services, such as AWS S3 reusable for all use cases. Even when the response is 3xx or 4xx. #58845 (Sema Checherinda).
- Improvements to aggregate functions
argMin/
argMax/
any/
anyLast/
anyHeavy, as well as
ORDER BY {u8/u16/u32/u64/i8/i16/u32/i64) LIMIT 1queries. #58640 (Raúl Marín).
- Trivial optimization for column's filter. Peak memory can be reduced to 44% of the original in some cases. #59698 (李扬).
- Execute
multiIffunction in a columnar fashion when the result type's underlying type is a number. #60384 (李扬).
- Faster (almost 2x) mutexes. #60823 (Azat Khuzhin).
- Drain multiple connections in parallel when a distributed query is finishing. #60845 (lizhuoyu5).
- Optimize data movement between columns of a Nullable number or a Nullable string, which improves some micro-benchmarks. #60846 (李扬).
- Operations with the filesystem cache will suffer less from the lock contention. #61066 (Alexey Milovidov).
- Optimize array join and other JOINs by preventing a wrong compiler's optimization. Close #61074. #61075 (李扬).
- If a query with a syntax error contained the
COLUMNSmatcher with a regular expression, the regular expression was compiled each time during the parser's backtracking, instead of being compiled once. This was a fundamental error. The compiled regexp was put to AST. But the letter A in AST means "abstract" which means it should not contain heavyweight objects. Parts of AST can be created and discarded during parsing, including a large number of backtracking. This leads to slowness on the parsing side and consequently allows DoS by a readonly user. But the main problem is that it prevents progress in fuzzers. #61543 (Alexey Milovidov).
- Add a new analyzer pass to optimize the IN operator for a single value. #61564 (LiuNeng).
- DNSResolver shuffles set of resolved IPs which is needed to uniformly utilize multiple endpoints of AWS S3. #60965 (Sema Checherinda).
Experimental Feature
- Support parallel reading for Azure blob storage. This improves the performance of the experimental Azure object storage. #61503 (SmitaRKulkarni).
- Add asynchronous WriteBuffer for Azure blob storage similar to S3. This improves the performance of the experimental Azure object storage. #59929 (SmitaRKulkarni).
- Use managed identity for backups IO when using Azure Blob Storage. Add a setting to prevent ClickHouse from attempting to create a non-existent container, which requires permissions at the storage account level. #61785 (Daniel Pozo Escalona).
- Add a setting
parallel_replicas_allow_in_with_subquery = 1which allows subqueries for IN work with parallel replicas. #60950 (Nikolai Kochetov).
- A change for the "zero-copy" replication: all zero copy locks related to a table have to be dropped when the table is dropped. The directory which contains these locks has to be removed also. #57575 (Sema Checherinda).
Improvement
- Use
MergeTreeas a default table engine. #60524 (Alexey Milovidov)
- Enable
output_format_pretty_row_numbersby default. It is better for usability. #61791 (Alexey Milovidov).
- In the previous version, some numbers in Pretty formats were not pretty enough. #61794 (Alexey Milovidov).
- A long value in Pretty formats won't be cut if it is the single value in the resultset, such as in the result of the
SHOW CREATE TABLEquery. #61795 (Alexey Milovidov).
- Similarly to
clickhouse-local,
clickhouse-clientwill accept the
--output-formatoption as a synonym to the
--formatoption. This closes #59848. #61797 (Alexey Milovidov).
- If stdout is a terminal and the output format is not specified,
clickhouse-clientand similar tools will use
PrettyCompactby default, similarly to the interactive mode.
clickhouse-clientand
clickhouse-localwill handle command line arguments for input and output formats in a unified fashion. This closes #61272. #61800 (Alexey Milovidov).
- Underscore digit groups in Pretty formats for better readability. This is controlled by a new setting,
output_format_pretty_highlight_digit_groups. #61802 (Alexey Milovidov).
- Add ability to override initial INSERT settings via
SYSTEM FLUSH DISTRIBUTED. #61832 (Azat Khuzhin).
- Enable processors profiling (time spent/in and out bytes for sorting, aggregation, ...) by default. #61096 (Azat Khuzhin).
- Support files without format extension in Filesystem database. #60795 (Kruglov Pavel).
- Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. #60420 (豪肥肥). I appreciate if you will continue to write it correctly, e.g.,
JSON😇, not
Json🤮, but we don't mind if you spell it as you prefer.
- Added
none_only_activemode for
distributed_ddl_output_modesetting. #60340 (Alexander Tokmakov).
- The advanced dashboard has slightly better colors for multi-line graphs. #60391 (Alexey Milovidov).
- The Advanced dashboard now has controls always visible on scrolling. This allows you to add a new chart without scrolling up. #60692 (Alexey Milovidov).
- While running the
MODIFY COLUMNquery for materialized views, check the inner table's structure to ensure every column exists. #47427 (sunny).
- String types and Enums can be used in the same context, such as: arrays, UNION queries, conditional expressions. This closes #60726. #60727 (Alexey Milovidov).
- Allow declaring Enums in the structure of external data for query processing (this is an immediate temporary table that you can provide for your query). #57857 (Duc Canh Le).
- Consider lightweight deleted rows when selecting parts to merge, so the disk size of the resulting part will be estimated better. #58223 (Zhuo Qiu).
- Added comments for columns for more system tables. Continuation of https://github.com/ClickHouse/ClickHouse/pull/58356. #59016 (Nikita Mikhaylov).
- Now we can use virtual columns in PREWHERE. It's worthwhile for non-const virtual columns like
_part_offset. #59033 (Amos Bird). Improved overall usability of virtual columns. Now it is allowed to use virtual columns in
PREWHERE(it's worthwhile for non-const virtual columns like
_part_offset). Now a builtin documentation is available for virtual columns as a comment of column in
DESCRIBEquery with enabled setting
describe_include_virtual_columns. #60205 (Anton Popov).
- Instead of using a constant key, now object storage generates key for determining remove objects capability. #59495 (Sema Checherinda).
- Allow "local" as object storage type instead of "local_blob_storage". #60165 (Kseniia Sumarokova).
- Parallel flush of pending INSERT blocks of Distributed engine on
DETACH/server shutdown and
SYSTEM FLUSH DISTRIBUTED(Parallelism will work only if you have multi-disk policy for a table (like everything in the Distributed engine right now)). #60225 (Azat Khuzhin).
- Add a setting to force read-through cache for merges. #60308 (Kseniia Sumarokova).
- An improvement for the MySQL compatibility protocol. The issue #57598 mentions a variant behaviour regarding transaction handling. An issued COMMIT/ROLLBACK when no transaction is active is reported as an error contrary to MySQL behaviour. #60338 (PapaToemmsn).
- Function
substringnow has a new alias
byteSlice. #60494 (Robert Schulze).
- Renamed server setting
dns_cache_max_sizeto
dns_cache_max_entriesto reduce ambiguity. #60500 (Kirill Nikiforov).
SHOW INDEX | INDEXES | INDICES | KEYSno longer sorts by the primary key columns (which was unintuitive). #60514 (Robert Schulze).
- Keeper improvement: abort during startup if an invalid snapshot is detected to avoid data loss. #60537 (Antonio Andelic).
- Update tzdata to 2024a. #60768 (Raúl Marín).
- Keeper improvement: support
leadership_expiry_msin Keeper's settings. #60806 (Brokenice0415).
- Always infer exponential numbers in JSON formats regardless of the setting
input_format_try_infer_exponent_floats. Add setting
input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objectsthat allows to use String type for ambiguous paths instead of an exception during named Tuples inference from JSON objects. #60808 (Kruglov Pavel).
- Add support for
START TRANSACTIONsyntax typically used in MySQL syntax, resolving https://github.com/ClickHouse/ClickHouse/discussions/60865. #60886 (Zach Naimon).
- Add a flag for the full-sorting merge join algorithm to treat null as biggest/smallest. So the behavior can be compitable with other SQL systems, like Apache Spark. #60896 (loudongfeng).
- Support detect output format by file exctension in
clickhouse-clientand
clickhouse-local. #61036 (豪肥肥).
- Update memory limit in runtime when Linux's CGroups value changed. #61049 (Han Fei).
- Add the function
toUInt128OrZero, which was missed by mistake (the mistake is related to https://github.com/ClickHouse/ClickHouse/pull/945). The compatibility aliases
FROM_UNIXTIMEand
DATE_FORMAT(they are not ClickHouse-native and only exist for MySQL compatibility) have been made case insensitive, as expected for SQL-compatibility aliases. #61114 (Alexey Milovidov).
- Improvements for the access checks, allowing to revoke of unpossessed rights in case the target user doesn't have the revoking grants either. Example:
GRANT SELECT ON *.* TO user1; REVOKE SELECT ON system.* FROM user1;. #61115 (pufit).
- Fix
has()function with
Nullablecolumn (fixes #60214). #61249 (Mikhail Koviazin).
- Now it's possible to specify the attribute
merge="true"in config substitutions for subtrees
<include from_zk="/path" merge="true">. In case this attribute specified, clickhouse will merge subtree with existing configuration, otherwise default behavior is append new content to configuration. #61299 (alesapin).
- Add async metrics for virtual memory mappings:
VMMaxMapCount&
VMNumMaps. Closes #60662. #61354 (Tuan Pham Anh).
- Use
temporary_files_codecsetting in all places where we create temporary data, for example external memory sorting and external memory GROUP BY. Before it worked only in
partial_mergeJOIN algorithm. #61456 (Maksim Kita).
- Add a new setting
max_parser_backtrackswhich allows to limit the complexity of query parsing. #61502 (Alexey Milovidov).
- Less contention during dynamic resize of the filesystem cache. #61524 (Kseniia Sumarokova).
- Disallow sharded mode of StorageS3 queue, because it will be rewritten. #61537 (Kseniia Sumarokova).
- Fixed typo: from
use_leagcy_max_levelto
use_legacy_max_level. #61545 (William Schoeffel).
- Remove some duplicate entries in
system.blob_storage_log. #61622 (YenchangChan).
- Added
current_userfunction as a compatibility alias for MySQL. #61770 (Yarik Briukhovetskyi).
- Fix inconsistent floating point aggregate function states in mixed x86-64 / ARM clusters #60610 (Harry Lee).
Build/Testing/Packaging Improvement
- The real-time query profiler now works on AArch64. In previous versions, it worked only when a program didn't spend time inside a syscall. #60807 (Alexey Milovidov).
- ClickHouse version has been added to docker labels. Closes #54224. #60949 (Nikolay Monkov).
- Upgrade
prqlcto 0.11.3. #60616 (Maximilian Roos).
- Add generic query text fuzzer in
clickhouse-local. #61508 (Alexey Milovidov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix finished_mutations_to_keep=0 for MergeTree (as docs says 0 is to keep everything) #60031 (Azat Khuzhin).
- Something was wrong with the FINAL optimization, here is how the author describes it: "PartsSplitter invalid ranges for the same part". #60041 (Maksim Kita).
- Something was wrong with Apache Hive, which is experimental and not supported. #60262 (shanfengp).
- An improvement for experimental parallel replicas: force reanalysis if parallel replicas changed #60362 (Raúl Marín).
- Fix usage of plain metadata type with new disks configuration option #60396 (Kseniia Sumarokova).
- Try to fix logical error 'Cannot capture column because it has incompatible type' in mapContainsKeyLike #60451 (Kruglov Pavel).
- Avoid calculation of scalar subqueries for CREATE TABLE. #60464 (Nikolai Kochetov).
- Fix deadlock in parallel parsing when lots of rows are skipped due to errors #60516 (Kruglov Pavel).
- Something was wrong with experimental KQL (Kusto) support: fix
max_query_size_for_kql_compound_operator: #60534 (Yong Wang).
- Keeper fix: add timeouts when waiting for commit logs #60544 (Antonio Andelic).
- Don't output number tips for date types #60577 (Raúl Marín).
- Fix reading from MergeTree with non-deterministic functions in filter #60586 (Kruglov Pavel).
- Fix logical error on bad compatibility setting value type #60596 (Kruglov Pavel).
- fix(prql): Robust panic handler #60615 (Maximilian Roos).
- Fix
intDivfor decimal and date arguments #60672 (Yarik Briukhovetskyi).
- Fix: expand CTE in alter modify query #60682 (Yakov Olkhovskiy).
- Fix system.parts for non-Atomic/Ordinary database engine (i.e. Memory) #60689 (Azat Khuzhin).
- Fix "Invalid storage definition in metadata file" for parameterized views #60708 (Azat Khuzhin).
- Fix buffer overflow in CompressionCodecMultiple #60731 (Alexey Milovidov).
- Remove nonsense from SQL/JSON #60738 (Alexey Milovidov).
- Remove wrong assertion in aggregate function quantileGK #60740 (李扬).
- Fix insert-select + insert_deduplication_token bug by setting streams to 1 #60745 (Jordi Villar).
- Prevent setting custom metadata headers on unsupported multipart upload operations #60748 (Francisco J. Jurado Moreno).
- Fix toStartOfInterval #60763 (Andrey Zvonov).
- Fix crash in arrayEnumerateRanked #60764 (Raúl Marín).
- Fix crash when using input() in INSERT SELECT JOIN #60765 (Kruglov Pavel).
- Fix crash with different allow_experimental_analyzer value in subqueries #60770 (Dmitry Novik).
- Remove recursion when reading from S3 #60849 (Antonio Andelic).
- Fix possible stuck on error in HashedDictionaryParallelLoader #60926 (vdimir).
- Fix async RESTORE with Replicated database (experimental feature) #60934 (Antonio Andelic).
- Fix deadlock in async inserts to
Logtables via native protocol #61055 (Anton Popov).
- Fix lazy execution of default argument in dictGetOrDefault for RangeHashedDictionary #61196 (Kruglov Pavel).
- Fix multiple bugs in groupArraySorted #61203 (Raúl Marín).
- Fix Keeper reconfig for standalone binary #61233 (Antonio Andelic).
- Fix usage of session_token in S3 engine #61234 (Kruglov Pavel).
- Fix possible incorrect result of aggregate function
uniqExact#61257 (Anton Popov).
- Fix bugs in show database #61269 (Raúl Marín).
- Fix logical error in RabbitMQ storage with MATERIALIZED columns #61320 (vdimir).
- Fix CREATE OR REPLACE DICTIONARY #61356 (Vitaly Baranov).
- Fix ATTACH query with external ON CLUSTER #61365 (Nikolay Degterinsky).
- Fix consecutive keys optimization for nullable keys #61393 (Anton Popov).
- fix issue of actions dag split #61458 (Raúl Marín).
- Fix finishing a failed RESTORE #61466 (Vitaly Baranov).
- Disable async_insert_use_adaptive_busy_timeout correctly with compatibility settings #61468 (Raúl Marín).
- Allow queuing in restore pool #61475 (Nikita Taranov).
- Fix an inconsistency when reading system.parts using UUID. #61479 (Dan Wu).
- Fix ALTER QUERY MODIFY SQL SECURITY #61480 (pufit).
- Fix a crash in window view (experimental feature) #61526 (Alexey Milovidov).
- Fix
repeatwith non-native integers #61527 (Antonio Andelic).
- Fix client's
-sargument #61530 (Mikhail f. Shiryaev).
- Fix crash in arrayPartialReverseSort #61539 (Raúl Marín).
- Fix string search with const position #61547 (Antonio Andelic).
- Fix addDays cause an error when used DateTime64 #61561 (Shuai li).
- Disallow LowCardinality input type for JSONExtract #61617 (Julia Kartseva).
- Fix
system.part_logfor async insert with deduplication #61620 (Antonio Andelic).
- Fix a
Non-ready setexception for system.parts. #61666 (Nikolai Kochetov).
- Fix actual_part_name for REPLACE_RANGE (
Entry actual part isn't empty yet) #61675 (Alexander Tokmakov).
- Fix a sanitizer report in
multiSearchAllPositionsCaseInsensitiveUTF8for incorrect UTF-8 #61749 (pufit).
- Fix an observation that the RANGE frame is not supported for Nullable columns. #61766 (YuanLiu).
ClickHouse release 24.2, 2024-02-29
Backward Incompatible Change
- Validate suspicious/experimental types in nested types. Previously we didn't validate such types (except JSON) in nested types like Array/Tuple/Map. #59385 (Kruglov Pavel).
- Add sanity check for number of threads and block sizes. #60138 (Raúl Marín).
- Don't infer floats in exponential notation by default. Add a setting
input_format_try_infer_exponent_floatsthat will restore previous behaviour (disabled by default). Closes #59476. #59500 (Kruglov Pavel).
- Allow alter operations to be surrounded by parenthesis. The emission of parentheses can be controlled by the
format_alter_operations_with_parenthesesconfig. By default, in formatted queries the parentheses are emitted as we store the formatted alter operations in some places as metadata (e.g.: mutations). The new syntax clarifies some of the queries where alter operations end in a list. E.g.:
ALTER TABLE x MODIFY TTL date GROUP BY a, b, DROP COLUMN ccannot be parsed properly with the old syntax. In the new syntax the query
ALTER TABLE x (MODIFY TTL date GROUP BY a, b), (DROP COLUMN c)is obvious. Older versions are not able to read the new syntax, therefore using the new syntax might cause issues if newer and older version of ClickHouse are mixed in a single cluster. #59532 (János Benjamin Antal).
- Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with
Not enough privileges. To address this problem, the release introduces a new feature of SQL security for views https://clickhouse.com/docs/sql-reference/statements/create/view#sql_security. #54901 #60439 (pufit).
New Feature
- Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. So, a View will encapsulate the grants. #54901 #60439 (pufit).
- Try to detect file format automatically during schema inference if it's unknown in
file/s3/hdfs/url/azureBlobStorageengines. Closes #50576. #59092 (Kruglov Pavel).
- Implement auto-adjustment for asynchronous insert timeouts. The following settings are introduced: async_insert_poll_timeout_ms, async_insert_use_adaptive_busy_timeout, async_insert_busy_timeout_min_ms, async_insert_busy_timeout_max_ms, async_insert_busy_timeout_increase_rate, async_insert_busy_timeout_decrease_rate. #58486 (Julia Kartseva).
- Allow to set up a quota for maximum sequential login failures. #54737 (Alexey Gerasimchuck).
- A new aggregate function
groupArrayIntersect. Follows up: #49862. #59598 (Yarik Briukhovetskyi).
- Backup & Restore support for
AzureBlobStorage. Resolves #50747. #56988 (SmitaRKulkarni).
- The user can now specify the template string directly in the query using
format_schema_rows_templateas an alternative to
format_template_row. Closes #31363. #59088 (Shaun Struwig).
- Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty
convert_to_replicatedfile in table's data directory (
/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/) and that table will be converted automatically on next server start. #57798 (Kirill).
- Added query
ALTER TABLE table FORGET PARTITION partitionthat removes ZooKeeper nodes, related to an empty partition. #59507 (Sergei Trifonov). This is an expert-level feature.
- Support JWT credentials file for the NATS table engine. #59543 (Nickolaj Jepsen).
- Implemented
system.dns_cachetable, which can be useful for debugging DNS issues. #59856 (Kirill Nikiforov).
- The codec
LZ4HCwill accept a new level 2, which is faster than the previous minimum level 3, at the expense of less compression. In previous versions,
LZ4HC(2)and less was the same as
LZ4HC(3). Author: Cyan4973. #60090 (Alexey Milovidov).
- Implemented
system.dns_cachetable, which can be useful for debugging DNS issues. New server setting dns_cache_max_size. #60257 (Kirill Nikiforov).
- Support single-argument version for the
mergetable function, as
merge(['db_name', ] 'tables_regexp'). #60372 (豪肥肥).
- Support negative positional arguments. Closes #57736. #58292 (flynn).
- Support specifying a set of permitted users for specific S3 settings in config using
userkey. #60144 (Antonio Andelic).
- Added table function
mergeTreeIndex. It represents the contents of index and marks files of
MergeTreetables. It can be used for introspection. Syntax:
mergeTreeIndex(database, table, [with_marks = true])where
database.tableis an existing table with
MergeTreeengine. #58140 (Anton Popov).
Experimental Feature
- Added function
seriesOutliersDetectTukeyto detect outliers in series data using Tukey's fences algorithm. #58632 (Bhavna Jindal). Keep in mind that the behavior will be changed in the next patch release.
- Add function
variantTypethat returns Enum with variant type name for each row. #59398 (Kruglov Pavel).
- Support
LEFT JOIN,
ALL INNER JOIN, and simple subqueries for parallel replicas (only with analyzer). New setting
parallel_replicas_prefer_local_joinchooses local
JOINexecution (by default) vs
GLOBAL JOIN. All tables should exist on every replica from
cluster_for_parallel_replicas. New settings
min_external_table_block_size_rowsand
min_external_table_block_size_bytesare used to squash small blocks that are sent for temporary tables (only with analyzer). #58916 (Nikolai Kochetov).
- Allow concurrent table creation in the
Replicateddatabase during adding or recovering a new replica. #59277 (Konstantin Bogdanov).
- Implement comparison operator for
Variantvalues and proper Field inserting into
Variantcolumn. Don't allow creating
Varianttype with similar variant types by default (allow uder a setting
allow_suspicious_variant_types) Closes #59996. Closes #59850. #60198 (Kruglov Pavel).
- Disable parallel replicas JOIN with CTE (not analyzer) #59239 (Raúl Marín).
Performance Improvement
- Primary key will use less amount of memory. #60049 (Alexey Milovidov).
- Improve memory usage for primary key and some other operations. #60050 (Alexey Milovidov).
- The tables' primary keys will be loaded in memory lazily on first access. This is controlled by the new MergeTree setting
primary_key_lazy_load, which is on by default. This provides several advantages: - it will not be loaded for tables that are not used; - if there is not enough memory, an exception will be thrown on first use instead of at server startup. This provides several disadvantages: - the latency of loading the primary key will be paid on the first query rather than before accepting connections; this theoretically may introduce a thundering-herd problem. This closes #11188. #60093 (Alexey Milovidov).
- Vectorized distance functions used in vector search. #58866 (Robert Schulze).
- Vectorized function
dotProductwhich is useful for vector search. #60202 (Robert Schulze).
- Add short-circuit ability for
dictGetOrDefaultfunction. Closes #52098. #57767 (jsc0218).
- Keeper improvement: cache only a certain amount of logs in-memory controlled by
latest_logs_cache_size_thresholdand
commit_logs_cache_size_threshold. #59460 (Antonio Andelic).
- Keeper improvement: reduce size of data node even more. #59592 (Antonio Andelic).
- Continue optimizing branch miss of
iffunction when result type is
Float*/Decimal*/*Int*, follow up of https://github.com/ClickHouse/ClickHouse/pull/57885. #59148 (李扬).
- Optimize
iffunction when the input type is
Map, the speed-up is up to ~10x. #59413 (李扬).
- Improve performance of the
Int8type by implementing strict aliasing (we already have it for
UInt8and all other integer types). #59485 (Raúl Marín).
- Optimize performance of sum/avg conditionally for bigint and big decimal types by reducing branch miss. #59504 (李扬).
- Improve performance of SELECTs with active mutations. #59531 (Azat Khuzhin).
- Optimized function
isNotNullwith AVX2. #59621 (李扬).
- Improve ASOF JOIN performance for sorted or almost sorted data. #59731 (Maksim Kita).
- The previous default value equals to 1 MB for
async_insert_max_data_sizeappeared to be too small. The new one would be 10 MiB. #59536 (Nikita Mikhaylov).
- Use multiple threads while reading the metadata of tables from a backup while executing the RESTORE command. #60040 (Vitaly Baranov).
- Now if
StorageBufferhas more than 1 shard (
num_layers> 1) background flush will happen simultaneously for all shards in multiple threads. #60111 (alesapin).
Improvement
- When output format is
Prettyformat and a block consists of a single numeric value which exceeds one million, A readable number will be printed on table right. #60379 (rogeryk).
- Added settings
split_parts_ranges_into_intersecting_and_non_intersecting_finaland
split_intersecting_parts_ranges_into_layers_final. These settings are needed to disable optimizations for queries with
FINALand needed for debug only. #59705 (Maksim Kita). Actually not only for that - they can also lower memory usage at the expense of performance.
- Rename the setting
extract_kvp_max_pairs_per_rowto
extract_key_value_pairs_max_pairs_per_row. The issue (unnecessary abbreviation in the setting name) was introduced in https://github.com/ClickHouse/ClickHouse/pull/43606. Fix the documentation of this setting. #59683 (Alexey Milovidov). #59960 (jsc0218).
- Running
ALTER COLUMN MATERIALIZEon a column with
DEFAULTor
MATERIALIZEDexpression now precisely follows the semantics. #58023 (Duc Canh Le).
- Enabled an exponential backoff logic for errors during mutations. It will reduce the CPU usage, memory usage and log file sizes. #58036 (MikhailBurdukov).
- Add improvement to count the
InitialQueryProfile Event. #58195 (Unalian).
- Allow to define
volume_priorityin
storage_configuration. #58533 (Andrey Zvonov).
- Add support for the
Date32type in the
T64codec. #58738 (Hongbin Ma).
- Allow trailing commas in types with several items. #59119 (Aleksandr Musorin).
- Settings for the Distributed table engine can now be specified in the server configuration file (similar to MergeTree settings), e.g.
<distributed> <flush_on_detach>false</flush_on_detach> </distributed>. #59291 (Azat Khuzhin).
- Retry disconnects and expired sessions when reading
system.zookeeper. This is helpful when reading many rows from
system.zookeepertable especially in the presence of fault-injected disconnects. #59388 (Alexander Gololobov).
- Do not interpret numbers with leading zeroes as octals when
input_format_values_interpret_expressions=0. #59403 (Joanna Hulboj).
- At startup and whenever config files are changed, ClickHouse updates the hard memory limits of its total memory tracker. These limits are computed based on various server settings and cgroups limits (on Linux). Previously, setting
/sys/fs/cgroup/memory.max(for cgroups v2) was hard-coded. As a result, cgroup v2 memory limits configured for nested groups (hierarchies), e.g.
/sys/fs/cgroup/my/nested/group/memory.maxwere ignored. This is now fixed. The behavior of v1 memory limits remains unchanged. #59435 (Robert Schulze).
- New profile events added to observe the time spent on calculating PK/projections/secondary indices during
INSERT-s. #59436 (Nikita Taranov).
- Allow to define a starting point for S3Queue with Ordered mode at the creation using a setting
s3queue_last_processed_path. #59446 (Kseniia Sumarokova).
- Made comments for system tables also available in
system.tablesin
clickhouse-local. #59493 (Nikita Mikhaylov).
system.zookeepertable: previously the whole result was accumulated in memory and returned as one big chunk. This change should help to reduce memory consumption when reading many rows from
system.zookeeper, allow showing intermediate progress (how many rows have been read so far) and avoid hitting connection timeout when result set is big. #59545 (Alexander Gololobov).
- Now dashboard understands both compressed and uncompressed state of URL's #hash (backward compatibility). Continuation of #59124 . #59548 (Amos Bird).
- Bumped Intel QPL (used by codec
DEFLATE_QPL) from v1.3.1 to v1.4.0 . Also fixed a bug for polling timeout mechanism, as we observed in same cases timeout won't work properly, if timeout happen, IAA and CPU may process buffer concurrently. So far, we'd better make sure IAA codec status is not QPL_STS_BEING_PROCESSED, then fallback to SW codec. #59551 (jasperzhu).
- Do not show a warning about the server version in ClickHouse Cloud because ClickHouse Cloud handles seamless upgrades automatically. #59657 (Alexey Milovidov).
- After self-extraction temporary binary is moved instead copying. #59661 (Yakov Olkhovskiy).
- Fix stack unwinding on Apple macOS. This closes #53653. #59690 (Nikita Mikhaylov).
- Check for stack overflow in parsers even if the user misconfigured the
max_parser_depthsetting to a very high value. This closes #59622. #59697 (Alexey Milovidov). #60434
- Unify XML and SQL created named collection behaviour in Kafka storage. #59710 (Pervakov Grigorii).
- In case when
merge_max_block_size_bytesis small enough and tables contain wide rows (strings or tuples) background merges may stuck in an endless loop. This behaviour is fixed. Follow-up for https://github.com/ClickHouse/ClickHouse/pull/59340. #59812 (Nikita Mikhaylov).
- Allow uuid in replica_path if CREATE TABLE explicitly has it. #59908 (Azat Khuzhin).
- Add column
metadata_versionof ReplicatedMergeTree table in
system.tablessystem table. #59942 (Maksim Kita).
- Keeper improvement: send only Keeper related metrics/events for Prometheus. #59945 (Antonio Andelic).
- The dashboard will display metrics across different ClickHouse versions even if the structure of system tables has changed after the upgrade. #59967 (Alexey Milovidov).
- Allow loading AZ info from a file. #59976 (Konstantin Bogdanov).
- Keeper improvement: add retries on failures for Disk related operations. #59980 (Antonio Andelic).
- Add new config setting
backups.remove_backup_files_after_failure:
<clickhouse> <backups> <remove_backup_files_after_failure>true</remove_backup_files_after_failure> </backups> </clickhouse>. #60002 (Vitaly Baranov).
- Copy S3 file GCP fallback to buffer copy in case GCP returned
Internal Errorwith
GATEWAY_TIMEOUTHTTP error code. #60164 (Maksim Kita).
- Short circuit execution for
ULIDStringToDateTime. #60211 (Juan Madurga).
- Added
query_idcolumn for tables
system.backupsand
system.backup_log. Added error stacktrace to
errorcolumn. #60220 (Maksim Kita).
- Connections through the MySQL port now automatically run with setting
prefer_column_name_to_alias = 1to support QuickSight out-of-the-box. Also, settings
mysql_map_string_to_text_in_show_columnsand
mysql_map_fixed_string_to_text_in_show_columnsare now enabled by default, affecting also only MySQL connections. This increases compatibility with more BI tools. #60365 (Robert Schulze).
- Fix a race condition in JavaScript code leading to duplicate charts on top of each other. #60392 (Alexey Milovidov).
Build/Testing/Packaging Improvement
- Added builds and tests with coverage collection with introspection. Continuation of #56102. #58792 (Alexey Milovidov).
- Update the Rust toolchain in
corrosion-cmakewhen the CMake cross-compilation toolchain variable is set. #59309 (Aris Tritas).
- Add some fuzzing to ASTLiterals. #59383 (Raúl Marín).
- If you want to run initdb scripts every time when ClickHouse container is starting you shoud initialize environment varible CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS. #59808 (Alexander Nikolaev).
- Remove ability to disable generic clickhouse components (like server/client/...), but keep some that requires extra libraries (like ODBC or keeper). #59857 (Azat Khuzhin).
- Query fuzzer will fuzz SETTINGS inside queries. #60087 (Alexey Milovidov).
- Add support for building ClickHouse with clang-19 (master). #60448 (Alexey Milovidov).
Bug Fix (user-visible misbehavior in an official stable release)
- Fix a "Non-ready set" error in TTL WHERE. #57430 (Nikolai Kochetov).
- Fix a bug in the
quantilesGKfunction #58216 (李扬).
- Fix a wrong behavior with
intDivfor Decimal arguments #59243 (Yarik Briukhovetskyi).
- Fix
translatewith FixedString input #59356 (Raúl Marín).
- Fix digest calculation in Keeper #59439 (Antonio Andelic).
- Fix stacktraces for binaries without debug symbols #59444 (Azat Khuzhin).
- Fix
ASTAlterCommand::formatImplin case of column specific settings… #59445 (János Benjamin Antal).
- Fix
SELECT * FROM [...] ORDER BY ALLwith Analyzer #59462 (zhongyuankai).
- Fix possible uncaught exception during distributed query cancellation #59487 (Azat Khuzhin).
- Make MAX use the same rules as permutation for complex types #59498 (Raúl Marín).
- Fix corner case when passing
update_insert_deduplication_token_in_dependent_materialized_views#59544 (Jordi Villar).
- Fix incorrect result of arrayElement / map on empty value #59594 (Raúl Marín).
- Fix crash in topK when merging empty states #59603 (Raúl Marín).
- Fix distributed table with a constant sharding key #59606 (Vitaly Baranov).
- Fix KQL issue found by WingFuzz #59626 (Yong Wang).
- Fix error "Read beyond last offset" for AsynchronousBoundedReadBuffer #59630 (Vitaly Baranov).
- Maintain function alias in RewriteSumFunctionWithSumAndCountVisitor #59658 (Raúl Marín).
- Fix query start time on non initial queries #59662 (Raúl Marín).
- Validate types of arguments for
minmaxskipping index #59733 (Anton Popov).
- Fix leftPad / rightPad function with FixedString input #59739 (Raúl Marín).
- Fix AST fuzzer issue in function
countMatches#59752 (Robert Schulze).
- RabbitMQ: fix having neither acked nor nacked messages #59775 (Kseniia Sumarokova).
- Fix StorageURL doing some of the query execution in single thread #59833 (Michael Kolupaev).
- S3Queue: fix uninitialized value #59897 (Kseniia Sumarokova).
- Fix parsing of partition expressions surrounded by parens #59901 (János Benjamin Antal).
- Fix crash in JSONColumnsWithMetadata format over HTTP #59925 (Kruglov Pavel).
- Do not rewrite sum to count if the return value differs in Analyzer #59926 (Azat Khuzhin).
- UniqExactSet read crash fix #59928 (Maksim Kita).
- ReplicatedMergeTree invalid metadata_version fix #59946 (Maksim Kita).
- Fix data race in
StorageDistributed#59987 (Nikita Taranov).
- Docker: run init scripts when option is enabled rather than disabled #59991 (jktng).
- Fix INSERT into
SQLitewith single quote (by escaping single quotes with a quote instead of backslash) #60015 (Azat Khuzhin).
- Fix several logical errors in
arrayFold#60022 (Raúl Marín).
- Fix optimize_uniq_to_count removing the column alias #60026 (Raúl Marín).
- Fix possible exception from S3Queue table on drop #60036 (Kseniia Sumarokova).
- Fix formatting of NOT with single literals #60042 (Raúl Marín).
- Use max_query_size from context in DDLLogEntry instead of hardcoded 4096 #60083 (Kruglov Pavel).
- Fix inconsistent formatting of queries containing tables named
table. Fix wrong formatting of queries with
UNION ALL,
INTERSECT, and
EXCEPTwhen their structure wasn't linear. This closes #52349. Fix wrong formatting of
SYSTEMqueries, including
SYSTEM ... DROP FILESYSTEM CACHE,
SYSTEM ... REFRESH/START/STOP/CANCEL/TEST VIEW,
SYSTEM ENABLE/DISABLE FAILPOINT. Fix formatting of parameterized DDL queries. Fix the formatting of the
DESCRIBE FILESYSTEM CACHEquery. Fix incorrect formatting of the
SET param_...(a query setting a parameter). Fix incorrect formatting of
CREATE INDEXqueries. Fix inconsistent formatting of
CREATE USERand similar queries. Fix inconsistent formatting of
CREATE SETTINGS PROFILE. Fix incorrect formatting of
ALTER ... MODIFY REFRESH. Fix inconsistent formatting of window functions if frame offsets were expressions. Fix inconsistent formatting of
RESPECT NULLSand
IGNORE NULLSif they were used after a function that implements an operator (such as
plus). Fix idiotic formatting of
SYSTEM SYNC REPLICA ... LIGHTWEIGHT FROM .... Fix inconsistent formatting of invalid queries with
GROUP BY GROUPING SETS ... WITH ROLLUP/CUBE/TOTALS. Fix inconsistent formatting of
GRANT CURRENT GRANTS. Fix inconsistent formatting of
CREATE TABLE (... COLLATE). Additionally, I fixed the incorrect formatting of
EXPLAINin subqueries (#60102). Fixed incorrect formatting of lambda functions (#60012). Added a check so there is no way to miss these abominations in the future. #60095 (Alexey Milovidov).
- Fix inconsistent formatting of explain in subqueries #60102 (Alexey Milovidov).
- Fix cosineDistance crash with Nullable #60150 (Raúl Marín).
- Allow casting of bools in string representation to true bools #60160 (Robert Schulze).
- Fix
system.s3queue_log#60166 (Kseniia Sumarokova).
- Fix arrayReduce with nullable aggregate function name #60188 (Raúl Marín).
- Hide sensitive info for
S3Queue#60233 (Kseniia Sumarokova).
- Fix http exception codes. #60252 (Austin Kothig).
- S3Queue: fix a bug (also fixes flaky test_storage_s3_queue/test.py::test_shards_distributed) #60282 (Kseniia Sumarokova).
- Fix use-of-uninitialized-value and invalid result in hashing functions with IPv6 #60359 (Kruglov Pavel).
- Fix OptimizeDateOrDateTimeConverterWithPreimageVisitor with null arguments #60453 (Raúl Marín).
- Fixed a minor bug that prevented distributed table queries sent from either KQL or PRQL dialect clients to be executed on replicas. #59674. #60470 (Alexey Milovidov) #59674 (Austin Kothig).
ClickHouse release 24.1, 2024-01-30
Backward Incompatible Change
- The setting
print_pretty_type_namesis turned on by default. You can turn it off to keep the old behavior or
SET compatibility = '23.12'. #57726 (Alexey Milovidov).
- The MergeTree setting
clean_deleted_rowsis deprecated, it has no effect anymore. The
CLEANUPkeyword for
OPTIMIZEis not allowed by default (unless
allow_experimental_replacing_merge_with_cleanupis enabled). #58316 (Alexander Tokmakov).
- The function
reverseDNSQueryis no longer available. This closes #58368. #58369 (Alexey Milovidov).
- Enable various changes to improve the access control in the configuration file. These changes affect the behavior, and you check the
config.xmlin the
access_control_improvementssection. In case you are not confident, keep the values in the configuration file as they were in the previous version. #58584 (Alexey Milovidov).
- Improve the operation of
sumMapFilteredwith NaN values. NaN values are now placed at the end (instead of randomly) and considered different from any values.
-0is now also treated as equal to
0; since 0 values are discarded,
-0values are discarded too. #58959 (Raúl Marín).
- The function
visibleWidthwill behave according to the docs. In previous versions, it simply counted code points after string serialization, like the
lengthUTF8function, but didn't consider zero-width and combining characters, full-width characters, tabs, and deletes. Now the behavior is changed accordingly. If you want to keep the old behavior, set
function_visible_width_behaviorto
0, or set
compatibilityto
23.12or lower. #59022 (Alexey Milovidov).
Kustodialect is disabled until these two bugs will be fixed: #59037 and #59036. #59305 (Alexey Milovidov). Any attempt to use
Kustowill result in exception.
- More efficient implementation of the
FINALmodifier no longer guarantees preserving the order even if
max_threads = 1. If you counted on the previous behavior, set
enable_vertical_finalto 0 or
compatibilityto
23.12.
New Feature
- Implement Variant data type that represents a union of other data types. Type
Variant(T1, T2, ..., TN)means that each row of this type has a value of either type
T1or
T2or ... or
TNor none of them (
NULLvalue). Variant type is available under a setting
allow_experimental_variant_type. Reference: #54864. #58047 (Kruglov Pavel).
- Certain settings (currently
min_compress_block_sizeand
max_compress_block_size) can now be specified at column-level where they take precedence over the corresponding table-level setting. Example:
CREATE TABLE tab (col String SETTINGS (min_compress_block_size = 81920, max_compress_block_size = 163840)) ENGINE = MergeTree ORDER BY tuple();. #55201 (Duc Canh Le).
- Add
quantileDDaggregate function as well as the corresponding
quantilesDDand
medianDD. It is based on the DDSketch https://www.vldb.org/pvldb/vol12/p2195-masson.pdf. ### Documentation entry for user-facing changes. #56342 (Srikanth Chekuri).
- Allow to configure any kind of object storage with any kind of metadata type. #58357 (Kseniia Sumarokova).
- Added
null_status_on_timeout_only_activeand
throw_only_activemodes for
distributed_ddl_output_modethat allow to avoid waiting for inactive replicas. #58350 (Alexander Tokmakov).
- Add function
arrayShinglesto compute subarrays, e.g.
arrayShingles([1, 2, 3, 4, 5], 3)returns
[[1,2,3],[2,3,4],[3,4,5]]. #58396 (Zheng Miao).
- Added functions
punycodeEncode,
punycodeDecode,
idnaEncodeand
idnaDecodewhich are useful for translating international domain names to an ASCII representation according to the IDNA standard. #58454 (Robert Schulze).
- Added string similarity functions
dramerauLevenshteinDistance,
jaroSimilarityand
jaroWinklerSimilarity. #58531 (Robert Schulze).
- Add two settings
output_format_compression_levelto change output compression level and
output_format_compression_zstd_window_logto explicitly set compression window size and enable long-range mode for zstd compression if output compression method is
zstd. Applied for
INTO OUTFILEand when writing to table functions
file,
url,
hdfs,
s3, and
azureBlobStorage. #58539 (Duc Canh Le).
- Automatically disable ANSI escape sequences in Pretty formats if the output is not a terminal. Add new
automode to setting
output_format_pretty_color. #58614 (Shaun Struwig).
- Added function
sqidDecodewhich decodes Sqids. #58544 (Robert Schulze).
- Allow to read Bool values into String in JSON input formats. It's done under a setting
input_format_json_read_bools_as_stringsthat is enabled by default. #58561 (Kruglov Pavel).
- Added function
seriesDecomposeSTLwhich decomposes a time series into a season, a trend and a residual component. #57078 (Bhavna Jindal).
- Introduced MySQL Binlog Client for MaterializedMySQL: One binlog connection for many databases. #57323 (Val Doroshchuk).
- Intel QuickAssist Technology (QAT) provides hardware-accelerated compression and cryptograpy. ClickHouse got a new compression codec
ZSTD_QATwhich utilizes QAT for zstd compression. The codec uses Intel's QATlib and Inte's QAT ZSTD Plugin. Right now, only compression can be accelerated in hardware (a software fallback kicks in in case QAT could not be initialized), decompression always runs in software. #57509 (jasperzhu).
- Implementing the new way how object storage keys are generated for s3 disks. Now the format could be defined in terms of
re2regex syntax with
key_templateoption in disc description. #57663 (Sema Checherinda).
- Table system.dropped_tables_parts contains parts of system.dropped_tables tables (dropped but not yet removed tables). #58038 (Yakov Olkhovskiy).
- Add settings
max_materialized_views_size_for_tableto limit the number of materialized views attached to a table. #58068 (zhongyuankai).
clickhouse-formatimprovements: support INSERT queries with
VALUES; support comments (use
--commentsto output them); support
--max_line_lengthoption to format only long queries in multiline. #58246 (vdimir).
- Attach all system tables in
clickhouse-local, including
system.parts. This closes #58312. #58359 (Alexey Milovidov).
- Support for
Enumdata types in function
transform. This closes #58241. #58360 (Alexey Milovidov).
- Add table
system.database_engines. #58390 (Bharat Nallan). Allow registering database engines independently in the codebase. #58365 (Bharat Nallan). Allow registering interpreters independently. #58443 (Bharat Nallan).
- Added
FROM <Replicas>modifier for
SYSTEM SYNC REPLICA LIGHTWEIGHTquery. With the
FROMmodifier ensures we wait for fetches and drop-ranges only for the specified source replicas, as well as any replica not in zookeeper or with an empty source_replica. #58393 (Jayme Bird).
- Added setting
update_insert_deduplication_token_in_dependent_materialized_views. This setting allows to update insert deduplication token with table identifier during insert in dependent materialized views. Closes #59165. #59238 (Maksim Kita).
- Added statement
SYSTEM RELOAD ASYNCHRONOUS METRICSwhich updates the asynchronous metrics. Mostly useful for testing and development. #53710 (Robert Schulze).
Performance Improvement
- Coordination for parallel replicas is rewritten for better parallelism and cache locality. It has been tested for linear scalability on hundreds of replicas. It also got support for reading in order. #57968 (Nikita Taranov).
- Replace HTTP outgoing buffering based with the native ClickHouse buffers. Add bytes counting metrics for interfaces. #56064 (Yakov Olkhovskiy).
- Large aggregation states of
uniqExactwill be merged in parallel in distrubuted queries. #59009 (Nikita Taranov).
- Lower memory usage after reading from
MergeTreetables. #59290 (Anton Popov).
- Lower memory usage in vertical merges. #59340 (Anton Popov).
- Avoid huge memory consumption during Keeper startup for more cases. #58455 (Antonio Andelic).
- Keeper improvement: reduce Keeper's memory usage for stored nodes. #59002 (Antonio Andelic).
- More cache-friendly final implementation. Note on the behaviour change: previously queries with
FINALmodifier that read with a single stream (e.g.
max_threads = 1) produced sorted output without explicitly provided
ORDER BYclause. This is no longer guaranteed when
enable_vertical_final = true(and it is so by default). #54366 (Duc Canh Le).
- Bypass extra copying in
ReadBufferFromIStreamwhich is used, e.g., for reading from S3. #56961 (Nikita Taranov).
- Optimize array element function when input is Array(Map)/Array(Array(Num)/Array(Array(String))/Array(BigInt)/Array(Decimal). The previous implementations did more allocations than needed. The optimization speed up is up to ~6x especially when input type is Array(Map). #56403 (李扬).
- Read column once while reading more than one subcolumn from it in compact parts. #57631 (Kruglov Pavel).
- Rewrite the AST of
sum(column + constant)function. This is available as an optimization pass for Analyzer #57853 (Jiebin Sun).
- The evaluation of function
matchnow utilizes skipping indices
ngrambf_v1and
tokenbf_v1. #57882 (凌涛).
- The evaluation of function
matchnow utilizes inverted indices. #58284 (凌涛).
- MergeTree
FINALdoes not compare rows from same non-L0 part. #58142 (Duc Canh Le).
- Speed up iota calls (filling array with consecutive numbers). #58271 (Raúl Marín).
- Speedup MIN/MAX for non-numeric types. #58334 (Raúl Marín).
- Optimize the combination of filters (like in multi-stage PREWHERE) with BMI2/SSE intrinsics #58800 (Zhiguo Zhou).
- Use one thread less in
clickhouse-local. #58968 (Alexey Milovidov).
- Improve the
multiIffunction performance when the type is Nullable. #57745 (KevinyhZou).
- Add
SYSTEM JEMALLOC PURGEfor purging unused jemalloc pages,
SYSTEM JEMALLOC [ ENABLE | DISABLE | FLUSH ] PROFILEfor controlling jemalloc profile if the profiler is enabled. Add jemalloc-related 4LW command in Keeper:
jmstfor dumping jemalloc stats,
jmfp,
jmep,
jmdpfor controlling jemalloc profile if the profiler is enabled. #58665 (Antonio Andelic).
- Lower memory consumption in backups to S3. #58962 (Vitaly Baranov).
Improvement
- Added comments (brief descriptions) to all columns of system tables. There are several reasons for this: - We use system tables a lot, and sometimes it could be very difficult for developer to understand the purpose and the meaning of a particular column. - We change (add new ones or modify existing) system tables a lot and the documentation for them is always outdated. For example take a look at the documentation page for
system.parts. It misses a lot of columns - We would like to eventually generate documentation directly from ClickHouse. #58356 (Nikita Mikhaylov).
- Allow queries without aliases for subqueries for
PASTE JOIN. #58654 (Yarik Briukhovetskyi).
- Enable
MySQL/
MariaDBintegration on macOS. This closes #21191. #46316 (Alexey Milovidov) (Robert Schulze).
- Disable
max_rows_in_set_to_optimize_joinby default. #56396 (vdimir).
- Add
<host_name>config parameter that allows avoiding resolving hostnames in ON CLUSTER DDL queries and Replicated database engines. This mitigates the possibility of the queue being stuck in case of a change in cluster definition. Closes #57573. #57603 (Nikolay Degterinsky).
- Increase
load_metadata_threadsto 16 for the filesystem cache. It will make the server start up faster. #57732 (Alexey Milovidov).
- Add ability to throttle merges/mutations (
max_mutations_bandwidth_for_server/
max_merges_bandwidth_for_server). #57877 (Azat Khuzhin).
- Replaced undocumented (boolean) column
is_hot_reloadablein system table
system.server_settingsby (Enum8) column
changeable_without_restartwith possible values
No,
Yes,
IncreaseOnlyand
DecreaseOnly. Also documented the column. #58029 (skyoct).
- Cluster discovery supports setting username and password, close #58063. #58123 (vdimir).
- Support query parameters in
ALTER TABLE ... PART. #58297 (Azat Khuzhin).
- Create consumers for Kafka tables on the fly (but keep them for some period -
kafka_consumers_pool_ttl_ms, since last used), this should fix problem with statistics for
system.kafka_consumers(that does not consumed when nobody reads from Kafka table, which leads to live memory leak and slow table detach) and also this PR enables stats for
system.kafka_consumersby default again. #58310 (Azat Khuzhin).
sparkBaras an alias to
sparkbar. #58335 (凌涛).
- Avoid sending
ComposeObjectrequests after upload to
GCS. #58343 (Azat Khuzhin).
- Correctly handle keys with dot in the name in configurations XMLs. #58354 (Azat Khuzhin).
- Make function
formatreturn constant on constant arguments. This closes #58355. #58358 (Alexey Milovidov).
- Adding a setting
max_estimated_execution_timeto separate
max_execution_timeand
max_estimated_execution_time. #58402 (Zhang Yifan).
- Provide a hint when an invalid database engine name is used. #58444 (Bharat Nallan).
- Add settings for better control of indexes type in Arrow dictionary. Use signed integer type for indexes by default as Arrow recommends. Closes #57401. #58519 (Kruglov Pavel).
- Implement #58575 Support
CLICKHOUSE_PASSWORD_FILEenvironment variable when running the docker image. #58583 (Eyal Halpern Shalev).
- When executing some queries, which require a lot of streams for reading data, the error
"Paste JOIN requires sorted tables only"was previously thrown. Now the numbers of streams resize to 1 in that case. #58608 (Yarik Briukhovetskyi).
- Better message for INVALID_IDENTIFIER error. #58703 (Yakov Olkhovskiy).
- Improved handling of signed numeric literals in normalizeQuery. #58710 (Salvatore Mesoraca).
- Support Point data type for MySQL. #58721 (Kseniia Sumarokova).
- When comparing a Float32 column and a const string, read the string as Float32 (instead of Float64). #58724 (Raúl Marín).
- Improve S3 compatibility, add ECloud EOS storage support. #58786 (xleoken).
- Allow
KILL QUERYto cancel backups / restores. This PR also makes running backups and restores visible in
system.processes. Also, there is a new setting in the server configuration now -
shutdown_wait_backups_and_restores(default=true) which makes the server either wait on shutdown for all running backups and restores to finish or just cancel them. #58804 (Vitaly Baranov).
- Avro format to support ZSTD codec. Closes #58735. #58805 (flynn).
- MySQL interface gained support for
net_write_timeoutand
net_read_timeoutsettings.
net_write_timeoutis translated into the native
send_timeoutClickHouse setting and, similarly,
net_read_timeoutinto
receive_timeout. Fixed an issue where it was possible to set MySQL
sql_select_limitsetting only if the entire statement was in upper case. #58835 (Serge Klochkov).
- A better exception message while conflict of creating dictionary and table with the same name. #58841 (Yarik Briukhovetskyi).
- Make sure that for custom (created from SQL) disks ether
filesystem_caches_path(a common directory prefix for all filesystem caches) or
custom_cached_disks_base_directory(a common directory prefix for only filesystem caches created from custom disks) is specified in server config.
custom_cached_disks_base_directoryhas higher priority for custom disks over
filesystem_caches_path, which is used if the former one is absent. Filesystem cache setting
pathmust lie inside that directory, otherwise exception will be thrown preventing disk to be created. This will not affect disks created on an older version and server was upgraded - then the exception will not be thrown to allow the server to successfully start).
custom_cached_disks_base_directoryis added to default server config as
/var/lib/clickhouse/caches/. Closes #57825. #58869 (Kseniia Sumarokova).
- MySQL interface gained compatibility with
SHOW WARNINGS/
SHOW COUNT(*) WARNINGSqueries, though the returned result is always an empty set. #58929 (Serge Klochkov).
- Skip unavailable replicas when executing parallel distributed
INSERT SELECT. #58931 (Alexander Tokmakov).
- Display word-descriptive log level while enabling structured log formatting in json. #58936 (Tim Liou).
- MySQL interface gained support for
CAST(x AS SIGNED)and
CAST(x AS UNSIGNED)statements via data type aliases:
SIGNEDfor Int64, and
UNSIGNEDfor UInt64. This improves compatibility with BI tools such as Looker Studio. #58954 (Serge Klochkov).
- Change working directory to the data path in docker container. #58975 (cangyin).
- Added setting for Azure Blob Storage
azure_max_unexpected_write_error_retries, can also be set from config under azure section. #59001 (SmitaRKulkarni).
- Allow server to start with broken data lake table. Closes #58625. #59080 (Kseniia Sumarokova).
- Allow to ignore schema evolution in the
Icebergtable engine and read all data using schema specified by the user on table creation or latest schema parsed from metadata on table creation. This is done under a setting
iceberg_engine_ignore_schema_evolutionthat is disabled by default. Note that enabling this setting can lead to incorrect result as in case of evolved schema all data files will be read using the same schema. #59133 (Kruglov Pavel).
- Prohibit mutable operations (
INSERT/
ALTER/
OPTIMIZE/...) on read-only/write-once storages with a proper
TABLE_IS_READ_ONLYerror (to avoid leftovers). Avoid leaving left-overs on write-once disks (
format_version.txt) on
CREATE/
ATTACH. Ignore
DROPfor
ReplicatedMergeTree(so as for
MergeTree). Fix iterating over
s3_plain(
MetadataStorageFromPlainObjectStorage::iterateDirectory). Note read-only is
webdisk, and write-once is
s3_plain. #59170 (Azat Khuzhin).
- Fix bug in the experimental
_block_numbercolumn which could lead to logical error during complex combination of
ALTERs and
merges. Fixes #56202. Replaces #58601. #59295 (alesapin).
- Play UI understands when an exception is returned inside JSON. Adjustment for #52853. #59303 (Alexey Milovidov).
/binaryHTTP handler allows to specify user, host, and optionally, password in the query string. #59311 (Alexey Milovidov).
- Support backups for compressed in-memory tables. This closes #57893. #59315 (Alexey Milovidov).
- Support the
FORMATclause in
BACKUPand
RESTOREqueries. #59338 (Vitaly Baranov).
- Function
concatWithSeparatornow supports arbitrary argument types (instead of only
Stringand
FixedStringarguments). For example,
SELECT concatWithSeparator('.', 'number', 1)now returns
number.1. #59341 (Robert Schulze).
Build/Testing/Packaging Improvement
- Improve aliases for clickhouse binary (now
ch/
clickhouseis
clickhouse-localor
clickhousedepends on the arguments) and add bash completion for new aliases. #58344 (Azat Khuzhin).
- Add settings changes check to CI to check that all settings changes are reflected in settings changes history. #58555 (Kruglov Pavel).
- Use tables directly attached from S3 in stateful tests. #58791 (Alexey Milovidov).
- Save the whole
fuzzer.logas an archive instead of the last 100k lines.
tail -n 100000often removes lines with table definitions. Example:. #58821 (Dmitry Novik).
- Enable Rust on macOS with Aarch64 (this will add fuzzy search in client with skim and the PRQL language, though I don't think that are people who host ClickHouse on darwin, so it is mostly for fuzzy search in client I would say). #59272 (Azat Khuzhin).
- Fix aggregation issue in mixed x86_64 and ARM clusters #59132 (Harry Lee).
Bug Fix (user-visible misbehavior in an official stable release)
- Add join keys conversion for nested LowCardinality #51550 (vdimir).
- Flatten only true Nested type if flatten_nested=1, not all Array(Tuple) #56132 (Kruglov Pavel).
- Fix a bug with projections and the
aggregate_functions_null_for_emptysetting during insertion. #56944 (Amos Bird).
- Fixed potential exception due to stale profile UUID #57263 (Vasily Nemkov).
- Fix working with read buffers in StreamingFormatExecutor #57438 (Kruglov Pavel).
- Ignore MVs with dropped target table during pushing to views #57520 (Kruglov Pavel).
- Eliminate possible race between ALTER_METADATA and MERGE_PARTS #57755 (Azat Khuzhin).
- Fix the expressions order bug in group by with rollup #57786 (Chen768959).
- A fix for the obsolete "zero-copy" replication feature: Fix lost blobs after dropping a replica with broken detached parts #58333 (Alexander Tokmakov).
- Allow users to work with symlinks in user_files_path #58447 (Duc Canh Le).
- Fix a crash when graphite table does not have an agg function #58453 (Duc Canh Le).
- Delay reading from StorageKafka to allow multiple reads in materialized views #58477 (János Benjamin Antal).
- Fix a stupid case of intersecting parts #58482 (Alexander Tokmakov).
- MergeTreePrefetchedReadPool disable for LIMIT only queries #58505 (Maksim Kita).
- Enable ordinary databases while restoration #58520 (Jihyuk Bok).
- Fix Apache Hive threadpool reading for ORC/Parquet/... #58537 (sunny).
- Hide credentials in
system.backup_log's
base_backup_namecolumn #58550 (Daniel Pozo Escalona).
toStartOfIntervalfor milli- microsencods values rounding #58557 (Yarik Briukhovetskyi).
- Disable
max_joined_block_rowsin ConcurrentHashJoin #58595 (vdimir).
- Fix join using nullable in the old analyzer #58596 (vdimir).
makeDateTime64: Allow non-const fraction argument #58597 (Robert Schulze).
- Fix possible NULL dereference during symbolizing inline frames #58607 (Azat Khuzhin).
- Improve isolation of query cache entries under re-created users or role switches #58611 (Robert Schulze).
- Fix broken partition key analysis when doing projection optimization #58638 (Amos Bird).
- Query cache: Fix per-user quota #58731 (Robert Schulze).
- Fix stream partitioning in parallel window functions #58739 (Dmitry Novik).
- Fix double destroy call on exception throw in addBatchLookupTable8 #58745 (Raúl Marín).
- Don't process requests in Keeper during shutdown #58765 (Antonio Andelic).
- Fix a null pointer dereference in
SlabsPolygonIndex::find#58771 (Yarik Briukhovetskyi).
- Fix JSONExtract function for LowCardinality(Nullable) columns #58808 (vdimir).
- A fix for unexpected accumulation of memory usage while creating a huge number of tables by CREATE and DROP. #58831 (Maksim Kita).
- Multiple read file log storage in mv #58877 (János Benjamin Antal).
- Restriction for the access key id for s3. #58900 (MikhailBurdukov).
- Fix possible crash in clickhouse-local during loading suggestions #58907 (Kruglov Pavel).
- Fix crash when
indexHintis used #58911 (Dmitry Novik).
- Fix StorageURL forgetting headers on server restart #58933 (Michael Kolupaev).
- Analyzer: fix storage replacement with insertion block #58958 (Yakov Olkhovskiy).
- Fix seek in ReadBufferFromZipArchive #58966 (Michael Kolupaev).
- A fix for experimental inverted indices (don't use in production):
DROP INDEXof inverted index now removes all relevant files from persistence #59040 (mochi).
- Fix data race on query_factories_info #59049 (Kseniia Sumarokova).
- Disable "Too many redirects" error retry #59099 (skyoct).
- Fix not started database shutdown deadlock #59137 (Sergei Trifonov).
- Fix: LIMIT BY and LIMIT in distributed query #59153 (Igor Nikonov).
- Fix crash with nullable timezone for
toString#59190 (Yarik Briukhovetskyi).
- Fix abort in iceberg metadata on bad file paths #59275 (Kruglov Pavel).
- Fix architecture name in select of Rust target #59307 (p1rattttt).
- Fix a logical error about "not-ready set" for querying from
system.tableswith a subquery in the IN clause. #59351 (Nikolai Kochetov).