Skip to main content
Skip to main content

v25.8 Changelog for Cloud

Backward incompatible changes

JSON and data format changes

Storage and partitioning

  • Add support for hive partition style writes and refactor reads implementation (hive partition columns are no longer virtual). #76802 (Arthur Passos).
  • Enable MergeTree setting write_marks_for_substreams_in_compact_parts by default. It significantly improves performance of subcolumns reading from newly created Compact parts. Servers with version less than 25.5 won't be able to read the new ompact parts. #84171 (Pavel Kruglov).
  • Disallow RENAME COLUMN or DROP COLUMN involving explicitly listed columns to sum in SummingMergeTree. Closes #81836. #82821 (Alexey Milovidov).

Function enhancements

  • Introduce a new argument unexpected_quoting_character_strategy to the extractKeyValuePairs function that controls what happens when a quoting_character is unexpectedly found. For more details see the docs for extractKeyValuePairs. #80657 (Arthur Passos).
  • Previously, the function countMatches would stop counting at the first empty match even if the pattern accepts it. To overcome this issue, countMatches now continues execution by advancing by a single character when an empty match occurs. Users who would like to retain the old behavior can enable setting count_matches_stop_at_empty_match. #81676 (Elmi Ahmadov).

Data type improvements

  • Improve the precision of conversion from Decimal to Float32. Implement conversion from Decimal to BFloat16. Closes #82660. #82823 (Alexey Milovidov).

Performance and resource management

  • Previously, BACKUP queries, merges and mutations were not using server-wide throttlers for local (max_local_read_bandwidth_for_server and max_local_write_bandwidth_for_server) and remote (max_remote_read_network_bandwidth_for_server and max_remote_write_network_bandwidth_for_server) traffic, instead they were only throttled by dedicated server settings (max_backup_bandwidth_for_server, max_mutations_bandwidth_for_server and max_merges_bandwidth_for_server). Now, they use both types of throttlers simultaneously. #81753 (Sergei Trifonov).
  • Added a new setting cluster_function_process_archive_on_multiple_nodes which increases performance of processing archives in cluster functions when set to true (by default). Should be set to false for compatibility and to avoid errors during upgrade to 25.7+ if using cluster functions with archives on earlier versions. #82355 (Kseniia Sumarokova).
  • The previous concurrent_threads_scheduler default value was round_robin, which proved unfair in the presence of a high number of single-threaded queries (e.g., INSERTs). This change makes a safer alternative fair_round_robin scheduler, the default. #84747 (Sergei Trifonov).
  • Lazy materialization is enabled only with the analyzer to avoid maintenance without the analyzer which can have some issues (for example, when using indexHint() in conditions). #83791 (Igor Nikonov).

Schema and SQL syntax

  • Forbid the creation of a table without insertable columns. #81835 (Pervakov Grigorii).
  • Require backticks around identifiers with dots in default expressions to prevent them from being parsed as compound identifiers. #83162 (Pervakov Grigorii).
  • Support PostgreSQL-style heredoc syntax: $tag$ string contents... $tag$, also known as dollar-quoted string literals. In previous versions, there were fewer restrictions on tags: they could contain arbitrary characters, including punctuation and whitespace. This introduces parsing ambiguity with identifiers that can also start with a dollar character. At the same time, PostgreSQL only allows word characters for tags. To resolve the problem, we now restrict heredoc tags only to contain word characters. Closes #84731. #84846 (Alexey Milovidov).

Security and permissions

  • SYSTEM RESTART REPLICAS will only restart replicas in the databases where you have permission to SHOW TABLES. Previously the query led to the wakeup of tables in the Lazy database, even without access to that database, while these tables were being concurrently dropped. #83321 (Alexey Milovidov).
  • Functions azureBlobStorage, deltaLakeAzure, and icebergAzure have been updated to properly validate AZURE permissions. All cluster-variant functions (-Cluster functions) now verify permissions against their corresponding non-clustered counterparts. Additionally, the icebergLocal and deltaLakeLocal functions now enforce FILE permission checks. #84938 ([Nikita Mikhaylov](https://github.com/nikitamikhaylov

New features

Data types

  • Add new data types: Time ([H]HH:MM:SS) and Time64 ([H]HH:MM:SS[.fractional]), and some basic cast functions and functions to interact with other data types. Added settings for compatibility with a legacy function ToTime. #81217 (Yarik Briukhovetskyi).

Functions

System tables

Iceberg and DeltaLake

  • Support complex types in iceberg schema evolution. #73714 (scanhex12).
  • Introduce Iceberg writes for insert queries. #82692 (scanhex12).
  • Support positional deletes for the Iceberg table engine. #83094 (Daniil Ivanik).
  • Read iceberg data files by field ids. Closes #83065. #83653 (scanhex12).
  • Iceberg writes for create. Closes #83927. #83983 (scanhex12).
  • Writes for Glue catalogs. #84136 (scanhex12).
  • Writes for Iceberg Rest catalogs. #84684 (scanhex12).
  • Merge all iceberg position delete files into data files. This will reduce amount and sizes of parquet files in iceberg storage. Syntax: OPTIMIZE TABLE table_name. #85250 (scanhex12).
  • Support DROP TABLE for iceberg (Removing from REST/Glue catalogs + removing metadata about the table). #85395 (scanhex12).
  • Support ALTER DELETE mutations for iceberg in merge-on-read format. #85549 (scanhex12).
  • Support writes into DeltaLake. Closes #79603. #85564 (Kseniia Sumarokova).
  • Write more iceberg statistics (column sizes, lower and upper bounds) in metadata (manifest entries) for min-max pruning. #85746 (scanhex12).
  • Support add/drop/modify columns in iceberg for simple types. #85769 (scanhex12).

MergeTree and storage

  • All tables now support the _table virtual column, not only Merge type tables. #63665 (Xiaozhe Yu).
  • Add SZ3 as a lossy yet error-bounded compression codec for columns of type Float32 and Float64. #67161 (scanhex12).
  • Added support for lightweight updates for MergeTree-family tables. Lightweight updates can be used by a new syntax: UPDATE <table> SET col1 = val1, col2 = val2, ... WHERE <condition>. Added implementation of lightweight deletes via lightweight updates which can be enabled by setting lightweight_delete_mode = 'lightweight_update'. #82004 (Anton Popov).
  • Support _part_granule_offset virtual column in MergeTree-family tables. This column indicates the zero-based index of the granule/mark each row belongs to within its data part. This addresses #79572. #82341 (Amos Bird).

Protocol and client Support

SQL and query features

Formats

  • Add format_schema_source setting which defines the source of format_schema. #80874 (Tuan Pham Anh).
  • Added Hash as a new output format. It calculates a single hash value for all columns and rows of the result. This is useful for calculating a "fingerprint" of the result, for example, in use cases where data transfer is a bottleneck. Example: SELECT arrayJoin(['abc', 'def']), 42 FORMAT Hash returns e5f9e676db098fdb9530d2059d8c23ef. #84607 (Robert Schulze).

Server configuration and workload management

  • Server setting cpu_slot_preemption enables preemptive CPU scheduling for workloads and ensures max-min fair allocation of CPU time among workloads. New workload settings for CPU throttling are added: max_cpus, max_cpu_share and max_burst_cpu_seconds. #80879 (Sergei Trifonov).
  • The workload setting max_waiting_queries is now supported. It can be used to limit the size of the query queue. If the limit is reached, all subsequent queries will be terminated with the SERVER_OVERLOADED error. #81250 (Oleg Doronin).
  • Drop TCP connection after configured number of queries or time threshold. Resolves #68000. #81472 (Kenny Sun).

Cloud storage

  • Add extra_credentials to AzureBlobStorage to authenticate with client_id and tenant_id. #84235 (Pablo Marcos).

Keeper

Experimental features

Table engines and table functions

Text index improvements

  • Add functions searchAny and searchAll which are general purpose tools to search text indexes. #80641 (Elmi Ahmadov).
  • The text index now supports string tokenizer. #81752 (Elmi Ahmadov).
  • Changed the default index granularity value for text indexes to 64. This improves the expected performance for the average test query in internal benchmarks. #82162 (Jimmy Aguilar Mena).
  • The 256-bit bitmap stores the outgoing labels of a state ordered, but outgoing states are saved into disk in the order they appear in the hash table. Therefore, a label would point to a wrong next state while reading from disk. #82783 (Elmi Ahmadov).
  • Currently, FST tree is saved into disk uncompressed. This could lead to slow performance or higher I/O bandwidth while both writing and reading into/from disk. #83093 (Elmi Ahmadov).

Feature maturity updates

Performance improvements

Query execution and aggregation

  • Trivial optimization for -If aggregate function combinator. #78454 (李扬).
  • Added new logic (controlled by the setting enable_producing_buckets_out_of_order_in_aggregation, enabled by default) that allows sending some buckets out of order during memory-efficient aggregation. When some aggregation buckets take significantly longer to merge than others, it improves performance by allowing the initiator to merge buckets with higher bucket id-s in the meantime. The downside is potentially higher memory usage (shouldn't be significant). #80179 (Nikita Taranov).
  • Make the pipeline after the TOTALS step multithreaded. #80331 (UnamedRus).
  • When the aggregation query contains only a single COUNT() function on a NOT NULL column, the aggregation logic is fully inlined during hash table probing. This avoids allocating and maintaining any aggregation state, significantly reducing memory usage and CPU overhead. This partially addresses #81982. #82104 (Amos Bird).
  • Calculate serialized key columnarly when group by multiple string or number columns. #83884 (李扬).
  • Implement addManyDefaults for -If combinators. #83870 (Raúl Marín).

JOIN optimizations

  • Add new setting min_joined_block_size_rows (analogous to min_joined_block_size_bytes; defaults to 65409) to control the minimum block size (in rows) for JOIN input and output blocks (if the join algorithm supports it). Small blocks will be squashed. #81886 (Nikita Taranov).
  • Performance of HashJoin optimised by removing the additional loop over hash maps in the typical case of only one key column, also null_map and join_mask checks are eliminated when they're always true/false. #82308 (Nikita Taranov).
  • The optimizations for null_map and JoinMask from #82308 were applied to the case of JOIN with multiple disjuncts. Also, the KnownRowsHolder data structure was optimized. #83041 (Nikita Taranov).
  • Plain std::vector<std::atomic_bool> is used for join flags to avoid calculating a hash on each access to flags. #83043 (Nikita Taranov).
  • Process max_joined_block_rows outside of hash JOIN main loop. Slightly better performance for ALL JOIN. #83216 (Nikolai Kochetov).
  • Don't pre-allocate memory for result columns beforehand when HashJoin uses lazy output mode. It is suboptimal, especially when the number of matches is low. Moreover, we know the exact amount of matches after joining is done, so we can preallocate more precisely. #83304 (Nikita Taranov).
  • All LEFT/INNER JOINs will be automatically converted to RightAny if the right side is functionally determined by the join key columns (all rows have unique join key values). #84010 (Nikita Taranov).
  • Improved performance of applying patch parts in Join mode. #85040 (Anton Popov).

Distributed query improvements

  • Introduced an option to offload (de)compression and (de)serialization of blocks into pipeline threads instead of a single thread associated with a network connection. Controlled by the setting enable_parallel_blocks_marshalling. It should speed up distributed queries that transfer significant amounts of data between the initiator and remote nodes. #78694 (Nikita Taranov).
  • Parallel distributed INSERT SELECT is enabled by default in mode where INSERT SELECT executed on each shard independently, see parallel_distributed_insert_select setting. #80425 (Igor Nikonov).
  • Compress logs and profile events in the native protocol. On clusters with 100+ replicas, uncompressed profile events take 1..10 MB/sec, and the progress bar is sluggish on slow Internet connections. This closes #82533. #82535 (Alexey Milovidov).
  • Parallel distributed INSERT SELECT is enabled by default in mode where INSERT SELECT executed on each shard independently, see parallel_distributed_insert_select setting. #83040 (Igor Nikonov).
  • Fixed the calculation of the minimal task size for parallel replicas. #84752 (Nikita Taranov).

Index improvements

  • Vector search queries using a vector similarity index complete with lower latency due to reduced storage reads and reduced CPU usage. #79103 (Shankar Iyer).
  • Respect merge_tree_min_{rows,bytes}_for_seek in filterPartsByQueryConditionCache to align it with other methods filtering by indexes. #80312 (李扬).
  • Process higher granularity min-max indexes first. Closes #75381. #83798 (Maruth Goyal).
  • The bloom filter index is now used for conditions like has([c1, c2, ...], column), where column is not of an Array type. This improves performance for such queries, making them as efficient as the IN operator. #83945 (Doron David).
  • Processes indexes in increasing order of file size. The net index ordering prioritizes minmax and vector indexes (due to simplicity and selectivity respectively), and small indexes thereafter. Within the minmax/vector indexes smaller indexes are also preferred. #84094 (Maruth Goyal).
  • Previously, the text index data would be separated into multiple segments (each segment size by default was 256 MiB). This might reduce the memory consumption while building the text index, however this increases the space requirement on the disk and increase the query response time. #84590 (Elmi Ahmadov).

Subquery optimizations

  • Optimize the generated plan for correlated subqueries by removing redundant JOIN operations using equivalence classes. If there are equivalent expressions for all correlated columns, CROSS JOIN is not produced if query_plan_correlated_subqueries_use_substitution setting is enabled. #82435 (Dmitry Novik).
  • Read only required columns in correlated subquery when it appears to be an argument of function EXISTS. #82443 (Dmitry Novik).

Azure Blob Storage improvements

  • azureBlobStorage table engine now cachees and reuses managed identity authentication tokens when possible to avoid throttling. #79860 (Nick Blakely).
  • Replace curl HTTP client with poco HTTP client for azure blob storage. Introduced multiple settings for these clients which mirror settings from S3. Introduced aggressive connect timeouts for both Azure and S3. Improved introspection into Azure profile events and metrics. New client is enabled by default, provide much better latencies for cold queries on top of Azure Blob Storage. Old Curl client can be returned back by setting azure_sdk_use_native_client=false. #83294 (alesapin).

Storage engine improvements

  • Fix filter by key for Redis and KeeperMap storages. #81833 (Pervakov Grigorii).
  • ATTACH PARTITION no longer leads to the dropping of all caches. #82377 (Alexey Milovidov).
  • Avoid holding the lock while creating storage snapshot data to reduce lock contention with high concurrent load. #83510 (Duc Canh Le).
  • Removing temporary parts may take a while (especially with S3), and currently we do it while holding a global lock in MergeTreeBackgroundExecutor. When we need to restart all tables due to connection loss and we wait for background tasks to finish - tables may even stuck in readonly mode for an hour. But looks like we don't need this lock for calling cancel. #84311 (Alexander Tokmakov).

Format improvements

  • New parquet reader implementation. It's generally faster and supports page-level filter pushdown and PREWHERE. Currently experimental. Use setting input_format_parquet_use_native_reader_v3 to enable. #82789 (Michael Kolupaev).
  • Improved performance of the ProtobufSingle input format by reusing the serializer when no parsing errors occur. #83613 (Eduard Karacharov).

Data type and serialization optimizations

  • Significantly improve performance of JSON subcolumns reading from shared data in MergeTree by implementing new serializations for JSON shared data in MergeTree. #83777 (Pavel Kruglov).
  • Optimize string deserialization by simplifying the code. Closes #38564. #84561 (Alexey Milovidov).

Pipeline and execution improvements

Memory and resource optimizations

  • Tweak some jemalloc configs to improve performance. #81807 (Antonio Andelic).
  • Add alignment in the Counter of ProfileEvents to reduce false sharing. #82697 (Jiebin Sun).
  • Reduce unnecessary memcpy calls in CompressedReadBufferBase::readCompressedData. #83986 (Raúl Marín).

Query planning and analysis

Logging improvements

Function optimizations

  • Optimize largestTriangleThreeBuckets by removing temporary data. #84479 (Alexey Milovidov).
  • Optimized and simplified implementation of many string-handling functions. Corrected incorrect documentation for several functions. Note: The output of byteSize for String columns and complex types containing String columns has changed from 9 bytes per empty string to 8 bytes per empty string, which is expected behavior. #85063 (Alexey Milovidov).

Keeper improvements

Data lake improvements

Improvements

Access control and security

  • Introduced two new access types: READ and WRITE for sources and deprecates all previous access types related to sources. Before GRANT S3 ON *.* TO user, now: GRANT READ, WRITE ON S3 TO user. This also allows to separate READ and WRITE permissions for sources, e.g.: GRANT READ ON * TO user, GRANT WRITE ON S3 TO user. The feature is controlled by a setting access_control_improvements.enable_read_write_grants and disabled by default. #73659 (pufit).
  • Allow parameters in CREATE USER queries for usernames. #81387 (Diskein).
  • Exclude sensitive data from core dumps. Add two allocators: AWS library compatible AwsNodumpMemoryManager and STL compatible JemallocNodumpSTLAllocator. Both are wrappers of the Jemalloc allocator. They use Jemalloc's extent hooks and madvise to mark memory pages as "don't dump". Used for S3 credentials, user credentials, and some query data. #82441 (Miсhael Stetsyuk).
  • Views, created by ephemeral users, will now store a copy of an actual user and will no longer be invalidated after the ephemeral user is deleted. #84763 (pufit).
  • Match external auth forward_headers in case-insensitive way. #84737 (ingodwerust).
  • Add a parameter column to system.grants to determine source type for GRANT READ/WRITE and the table engine for GRANT TABLE ENGINE. #85643 (MikhailBurdukov).

Backup and restore

Data integrity and validation

  • Verify the part has consistent checksum.txt file right before committing it. #76625 (Sema Checherinda).
  • Forbid to start RENAME COLUMN alter mutation if it will rename some column that right now affected by incomplete data mutation. #81823 (Mikhail Artemenko).
  • Now mutations snapshot will be built from the snapshot of visible parts. Mutation counters used in snapshot will also be recalculated from the included mutations. #82945 (Mikhail Artemenko).
  • Add ability to parse part's prefix and suffix and also check coverage for non constant columns. #83377 (Mikhail Artemenko).

Iceberg table engine

  • Support position deletes for Iceberg table engine. #80237 (YanghongZhong).
  • Now clickhouse supports compressed metadata.json files for Iceberg. Fixes #70874. #81451 (alesapin).
  • Fix iceberg reading by field ids for complex types. #84821 (scanhex12).
  • Support iceberg writes to read from pyiceberg. #84466 (scanhex12).
  • Adds snapshot version to data lake table engines. #84659 (Pete Hampton).
  • Support writing of a version-hint file with Iceberg. This closes #85097. #85130 (scanhex12).
  • Support compressed .metadata.json file via iceberg_metadata_compression_method setting. It supports all clickhouse compression methods. This closes #84895. #85196 (scanhex12).
  • Optimized memory usage for Iceberg positional delete files. Instead of loading all delete file data into memory, only the current row-group from Parquet delete files is kept in RAM. This significantly reduces memory consumption when working with large positional delete files. #85329 (scanhex12).
  • Allow asynchronously iterating objects from Iceberg table without storing objects for each data file explicitly. #85369 (Daniil Ivanik).
  • Support Iceberg equality deletes. #85843 (Han Fei).

DeltaLake table engine

  • Improve DeltaLake table engine: delta-kernel-rs has ExpressionVisitor API which is implemented in this PR and is applied to partition column expressions transform (it will replace an old deprecated within the delta-kernel-rs way, which was used before in our code). In the future this ExpressionVisitor will also allow to implement statistics based pruning and some delta-lake proprietary features. Additionally the purpose of this change is to support partition pruning in DeltaLakeCluster table engine (the result of a parsed expression - ActionsDAG - will be serialized and sent from the initiator along with the data path, because this kind of information, which is needed for pruning, is only available as meta information on data files listing, which is done by initiator only, but it has to be applied to data on each reading server). #81136 (Kseniia Sumarokova).
  • Fix partition pruning with data lake cluster functions. #82131 (Kseniia Sumarokova).
  • Fix reading partitioned data in DeltaLakeCluster table function. In this PR cluster functions protocol version is increased, allowing to send extra info from initiator to replicas. This extra info contains delta-kernel transform expression, which is needed to parse partition columns (and some other staff in the future, like generated columns, etc). #82132 (Kseniia Sumarokova).
  • Now database Datalake throw more convenient exception. Fixes #81211. #82304 (alesapin).
  • Implement internal delta-kernel-rs filtering (statistics and partition pruning) in storage DeltaLake. #84006 (Kseniia Sumarokova).
  • Add a setting delta_lake_enable_expression_visitor_logging to turn off expression visitor logs as they can be too verbose even for test log level when debugging something. #84315 (Kseniia Sumarokova).
  • Add setting delta_lake_snapshot_version to allow reading specific snapshot version in table engine DeltaLake. #85295 (Kseniia Sumarokova).

Data lake integration

  • Speedup tables listing in data catalogs by asynchronous requests. #81084 (alesapin).
  • Support TimestampTZ in Glue catalog. This closes #81654. #83132 (scanhex12).
  • Splits FormatParserGroup on two independent structs, the first one is responsible for shared compute and IO resources, the second one is responsible for shared filter resources (filter ActionDag, KeyCondition). This is done for more flexible shared usage of these structures by different threads. #83997 (Daniil Ivanik).
  • Add missing partition_columns_in_data_file to azure configuration. #85373 (Arthur Passos).
  • Add show_data_lake_catalogs_in_system_tables flag to manage adding data lake tables in system.tables resolves #85384. #85411 (Smita Kulkarni).

S3 and object storage

  • Implement methods moveFile and replaceFile in s3_plain_rewritable to support it as a database disk. #79424 (Tuan Pham Anh).
  • S3 read and write requests are throttled on the HTTP socket level (instead of whole S3 requests) to avoid issues with max_remote_read_network_bandwidth_for_server and max_remote_write_network_bandwidth_for_server throttling. #81837 (Sergei Trifonov).
  • This PR introduces jitter to the S3 retry mechanism when the s3_slow_all_threads_after_network_error configuration is enabled. #81849 (zoomxi).
  • Implement AWS S3 authentication with an explicitly provided IAM role. Implement OAuth for GCS. These features were recently only available in ClickHouse Cloud and are now open-sourced. Synchronize some interfaces such as serialization of the connection parameters for object storages. #84011 (Alexey Milovidov).
  • Allow to use any storage policy (i.e. object storage, such as S3) for external aggregation/sorting. #84734 (Azat Khuzhin).
  • Collect all removed objects to execute single object storage remove operation. #85316 (Mikhail Artemenko).

S3Queue table engine

  • Macros like {uuid} can now be used in the keeper_path setting of the S3Queue table engine. #82463 (Nikolay Degterinsky).
  • Add a new server setting s3queue_disable_streaming which disables streaming in tables with S3Queue table engine. This setting is changeable without server restart. #82515 (Kseniia Sumarokova).
  • Add columns commit_time, commit_id to system.s3queue_log. #83016 (Kseniia Sumarokova).
  • Add logs for s3queue shutdown process. #83163 (Kseniia Sumarokova).
  • Shutdown S3(Azure/etc)Queue streaming before shutting down any tables on server shutdown. #83530 (Kseniia Sumarokova).
  • Support changing mv insert settings on S3Queue table level. Added new S3Queue level settings: min_insert_block_size_rows_for_materialized_views and min_insert_block_size_bytes_for_materialized_views. By default profile level settings will be used and S3Queue level settings will override those. #83971 (Kseniia Sumarokova).
  • S3Queue ordered mode fix: quit earlier if shutdown was called. #84463 (Kseniia Sumarokova).

Kafka integration

ClickHouse Keeper improvements

  • Keeper improvement: move changelog files between disk in a background thread. Previously, moving changelog to a different disk would block Keeper globally until the move is finished. This lead to performance degradation if moving is a long operation (e.g. to S3 disk). #82485 (Antonio Andelic).
  • Keeper improvement: add new config keeper_server.cleanup_old_and_ignore_new_acl. If enabled, all nodes will have their ACLs cleared while ACL for new requests will be ignored. If the goal is to completely remove ACL from nodes, it's important to leave the config enabled until a new snapshot is created. #82496 (Antonio Andelic).
  • Keeper improvement: support specific permissions for world:anyone ACL. #82755 (Antonio Andelic).
  • Add support for specifying extra Keeper ACL for paths in config. If you want to add extra ACL for a specific path you define it in the config under zookeeper.path_acls. #82898 (Antonio Andelic).
  • Adds ProfileEvent when Keeper rejects a write due to soft memory limit. #82963 (Xander Garbett).
  • Enable create_if_not_exists, check_not_exists, remove_recursive feature flags in Keeper by default which enable new types of requests. #83488 (Antonio Andelic).
  • Add support for applying extra ACL on specific Keeper nodes using apply_to_children config. #84137 (Antonio Andelic).
  • Add get_acl command to KeeperClient. #84641 (Antonio Andelic).
  • Add 4LW in Keeper, lgrq, for toggling request logging of received requests. #84719 (Antonio Andelic).
  • Reduce contention on storage lock in Keeper. #84732 (Antonio Andelic).
  • The encrypt_decrypt tool now supports encrypted ZooKeeper connections. #84764 (Roman Vasin).
  • Limit Keeper log entry cache size by number of entries using keeper_server.coordination_settings.latest_logs_cache_entry_count_threshold and keeper_server.coordination_settings.commit_logs_cache_entry_count_threshold. #84877 (Antonio Andelic).

JSON and Dynamic types

  • Add columns_substreams.txt file to Wide part to track all substreams stored in the part. It helps to track dynamic streams in JSON and Dynamic types and so avoid reading sample of these columns to get the list of dynamic streams (for example for columns sizes calculation). Also now all dynamic streams are reflected in system.parts_columns. #81091 (Pavel Kruglov).
  • Allow ALTER UPDATE in JSON and Dynamic columns. #82419 (Pavel Kruglov).
  • Users can now use Time and Time64 types inside the JSON type. #83784 (Yarik Briukhovetskyi).
  • Add a setting json_type_escape_dots_in_keys to escape dots in JSON keys during JSON type parsing. The setting is disabled by default. #84207 (Pavel Kruglov).

Parquet and ORC formats

  • Introduce settings to set ORC compression block size, and update its default value from 64KB to 256KB to keep consistent with spark or hive. #80602 (李扬).
  • Support writing parquet enum as byte array as the spec dictates. I'll write more info later. #81090 (Arthur Passos).
  • Support writing geoparquets as output format. #81784 (scanhex12).

Distributed queries and parallel replicas

  • A new setting, enable_add_distinct_to_in_subqueries, has been introduced. When enabled, ClickHouse will automatically add DISTINCT to subqueries in IN clauses for distributed queries. This can significantly reduce the size of temporary tables transferred between shards and improve network efficiency. Note: This is a trade-off—while network transfer is reduced, additional merging (deduplication) work is required on each node. Enable this setting when network transfer is a bottleneck and the merging cost is acceptable. #81908 (fhw12345).
  • Add support of remote-() table functions with parallel replicas if cluster is provided in address_expression argument. Also, fixes #73295. #82904 (Igor Nikonov).
  • Joins with parallel replicas now use the join logical step. In case of any issues with join queries using parallel replicas, try SET query_plan_use_new_logical_join_step=0 and report an issue. #83801 (Vladimir Cherkasov).

Settings and configuration

  • Mark setting allow_experimental_join_condition as obsolete. #80566 (Vladimir Cherkasov).
  • The total and per-user network throttlers are never reset, which ensures that max_network_bandwidth_for_all_users and max_network_bandwidth_for_all_users limits are never exceeded. #81729 (Sergei Trifonov).
  • Introduce the optimize_rewrite_regexp_functions setting (enabled by default), which allows the optimizer to rewrite certain replaceRegexpAll, replaceRegexpOne, and extract calls into simpler and more efficient forms when specific regular expression patterns are detected. (issue #81981). #81992 (Amos Bird).
  • Tune TCP servers queue (64 by default) based on listen_backlog (4096 by default). #82045 (Azat Khuzhin).
  • Add ability to reload max_local_read_bandwidth_for_server and max_local_write_bandwidth_for_server on fly without restart server. #82083 (Kai Zhu).
  • Introduce setting enable_vector_similarity_index which must be enabled to use the vector similarity index. The existing setting allow_experimental_vector_similarity_index is now obsolete. It still works in case someone needs it. #83459 (Robert Schulze).
  • Add max_joined_block_size_bytes in addition to max_joined_block_size_rows to limit the memory usage of JOINs with heavy columns. #83869 (Nikolai Kochetov).
  • Fix compatibility for cluster_function_process_archive_on_multiple_nodes. #83968 (Kseniia Sumarokova).
  • Enable correlated subqueries support by default. #85107 (Dmitry Novik).
  • Add database_replicated settings defining the default values of DatabaseReplicatedSettings. If the setting is not present in the Replicated DB create query, the value from this setting is used. #85127 (Tuan Pham Anh).
  • Allow key value arguments in s3 or s3Cluster table engine/function, e.g. for example s3('url', CSV, structure = 'a Int32', compression_method = 'gzip'). #85134 (Kseniia Sumarokova).
  • Execute non-correlated EXISTS as a scalar subquery. This allows using a scalar subquery cache and constant-folding the result, which is helpful for indexes. For compatibility, the new setting execute_exists_as_scalar_subquery=1 is added. #85481 (Nikolai Kochetov).
  • Support resolution of more cases for compound identifiers. Particularly, it improves the compatibility of ARRAY JOIN with the old analyzer. Introduce a new setting analyzer_compatibility_allow_compound_identifiers_in_unflatten_nested to keep the old behaviour. #85492 (Nikolai Kochetov).

System tables and observability

  • Add pressure metrics to ClickHouse async metrics. #80779 (Xander Garbett).
  • Add metrics MarkCacheEvictedBytes, MarkCacheEvictedMarks, MarkCacheEvictedFiles for tracking evictions from the mark cache. (issue #60989). #80799 (Shivji Kumar Jha).
  • The system.formats table now contains extended information about formats, such as HTTP content type, the capabilities of schema inference, etc. #81505 (Alexey Milovidov).
  • Add support for clearing all warnings from the system.warnings table using TRUNCATE TABLE system.warnings. #82087 (Vladimir Cherkasov).
  • List the licenses of rust crates in system.licenses. #82440 (Raúl Marín).
  • Estimate complex cnf/dnf, for example, (a < 1 and a > 0) or b = 3, by statistics. #82663 (Han Fei).
  • In some cases, we need to have multiple dimensions to our metrics. For example, counting failed merges or mutations by error codes rather than having a single counter. #83030 (Miсhael Stetsyuk).
  • Add process resource metrics (such as UserTimeMicroseconds, SystemTimeMicroseconds, RealTimeMicroseconds) to part_log profile events for MergeParts entries. #83460 (Vladimir Cherkasov).
  • Cgroup-level and system-wide metrics are reported now altogether. Cgroup-level metrics have names CGroup<Metric> and OS-level metrics (collected from procfs) have names OS<Metric>. #84317 (Nikita Taranov).
  • Add dimensional metrics to monitor the size of concurrent bounded queues, labeled by queue type and instance ID for better observability. #84675 (Miсhael Stetsyuk).
  • The system.columns table now provides column as an alias for the existing name column. #84695 (Yunchi Pang).
  • Add format string column to system.errors. This column is needed to group by the same error type in alerting rules. #84776 (Miсhael Stetsyuk).
  • Make limits tunable for Async Log and add introspection. #85105 (Raúl Marín).
  • Ignore UNKNOWN_DATABASE while obtaining table columns sizes for system.columns. #85632 (Azat Khuzhin).

Database engines

System and internal improvements

  • Fix attaching databases with read-only remote disks by manually adding table UUIDs to the DatabaseCatalog. #82670 (Tuan Pham Anh).
  • Improve DDL task handling when distributed_ddl_output_mode='*_only_active' by not waiting for new or recovered replicas that have replication lag exceeding max_replication_lag_to_enqueue. This helps avoid DDL task is not finished on some hosts errors when a new replica becomes active after initialization or recovery but has accumulated a large replication log. Also implemented SYSTEM SYNC DATABASE REPLICA STRICT query that waits for the replication log to fall below max_replication_lag_to_enqueue. #83302 (Alexander Tokmakov).
  • Chang SystemLogs shutdown order to occur after ordinary tables (and before system tables, instead of before ordinary tables). #83134 (Kseniia Sumarokova).
  • Add server setting logs_to_keep for replicated database settings, allowing configuration of the default logs_to_keep parameter for replicated databases. Lower values reduce the number of ZooKeeper nodes (especially beneficial when there are many databases), while higher values allow missing replicas to catch up after longer periods of downtime. #84183 (Alexey Khatskevich).
  • Change the default value of the Replicated database setting max_retries_before_automatic_recovery to 10, enabling faster recovery in some cases. #84369 (Alexander Tokmakov).
  • Optimize non-append Refreshable Materialized View DDL operations in Replicated databases by skipping creation and renaming of old temporary tables. #84858 (Tuan Pham Anh).

Replication and synchronization

SystemAndInternalImprovements

  • Improve SYSTEM RESTART REPLICA to retry table creation when ZooKeeper connection issues occur, preventing tables from being forgotten. #82616 (Nikolay Degterinsky).
  • Add UUID validation in ReplicatedMergeTree::executeMetadataAlter to prevent incorrect table definitions when tables are exchanged between getting the StorageID and calling IDatabase::alterTable. #82666 (Nikolay Degterinsky).
  • Remove experimental send_metadata logic related to experimental zero-copy replication. This code was never used, unsupported, and likely broken, with no tests to verify its functionality. #82508 (alesapin).
  • Add support for macro expansion in remote_fs_zero_copy_zookeeper_path. #85437 (Mikhail Koviazin).

Functions and expressions

  • Function addressToSymbol and system.symbols table will use file offsets instead of virtual memory addresses. #81896 (Alexey Milovidov).
  • Try to preserve element names when deriving supertypes for named tuples. #81345 (lgbo).
  • Allow to mix different collations for the same column in different windows. #82877 (Yakov Olkhovskiy).
  • Add function to write types into wkb format. #82935 (scanhex12).
  • Add ability to parse Time and Time64 as MM:SS, M:SS, SS, or S. #83299 (Yarik Briukhovetskyi).
  • Function reinterpret() now supports conversion to Array(T) where T is a fixed-size data type (issue #82621). #83399 (Shankar Iyer).
  • Fix structureToProtobufSchema and structureToCapnProtoSchema functions to correctly add a zero-terminating byte instead of using a newline, preventing missing newlines in output and potential buffer overflows in functions that depend on the zero byte (such as logTrace, demangle, extractURLParameter, toStringCutToZero, and encrypt/decrypt). Closes #85062. #85063 (Alexey Milovidov).
  • Fix the regexp_tree dictionary layout to support processing strings with zero bytes. #85063 (Alexey Milovidov).
  • Fix the formatRowNoNewline function which was erroneously cutting the last character of output when called with Values format or any format without a newline at the end of rows. #85063 (Alexey Milovidov).
  • Fix an exception-safety error in the stem function that could lead to memory leaks in rare scenarios. #85063 (Alexey Milovidov).
  • Fix the initcap function for FixedString arguments to correctly recognize the start of words at the beginning of strings when the previous string in a block ended with a word character. #85063 (Alexey Milovidov).
  • Fix a security vulnerability in the Apache ORC format that could lead to exposure of uninitialized memory. #85063 (Alexey Milovidov).
  • Changed behavior of replaceRegexpAll and its alias REGEXP_REPLACE to allow empty matches at the end of strings even when the previous match processed the entire string (e.g., ^a*|a*$ or ^|.*), aligning with JavaScript, Perl, Python, PHP, and Ruby semantics but differing from PostgreSQL. #85063 (Alexey Milovidov).
  • Optimize and simplify the implementation of many string-handling functions. Corrected incorrect documentation for several functions. Note: The output of byteSize for String columns and complex types containing String columns has changed from 9 bytes per empty string to 8 bytes per empty string, which is the expected behavior. #85063 (Alexey Milovidov).
  • Allow zero step in functions timeSeries*ToGrid() This is part #3 of https://github.com/ClickHouse/ClickHouse/pull/75036. #85390 (Vitaly Baranov).
  • Support inner arrays for the function nested. #85719 (Nikolai Kochetov).

MergeTree improvements

  • Disable skipping indexes that depend on columns updated on the fly or by patch parts more granularly. Now, skipping indexes are not used only in parts affected by on-the-fly mutations or patch parts; previously, those indexes were disabled for all parts. #84241 (Anton Popov).
  • Add MergeTree setting search_orphaned_parts_drives to limit scope to look for parts e.g. by disks with local metadata. #84710 (Ilya Golshtein).
  • Add missing support of read_in_order_use_virtual_row for WHERE. It allows to skip reading more parts for queries with filters that were not fully pushed to PREWHERE. #84835 (Nikolai Kochetov).
  • Fix usage of "compact" Variant discriminators serialization in MergeTree. Perviously it wasn't used in some cases when it could be used. #84141 (Pavel Kruglov).
  • Add limit (table setting max_uncompressed_bytes_in_patches) for total uncompressed bytes in patch parts. It prevents significant slowdowns of SELECT queries after lightweight updates and prevents possible misuse of lightweight updates. #85641 (Anton Popov).

Cache and memory management

  • Fix logical error in filesystem cache: "Having zero bytes but range is not finished". #81868 (Kseniia Sumarokova).
  • Add rendezvous hashing to improve cache locality. #82511 (Anton Ivashkin).
  • Refactor dynamic resize feature of filesystem cache. Added more logs for introspection. #82556 (Kseniia Sumarokova).
  • Reduce query memory tracking overhead for executable user-defined functions. #83929 (Eduard Karacharov).
  • All the allocations done by external libraries are now visible to ClickHouse's memory tracker and accounted properly. This may result in "increased" reported memory usage for certain queries or failures with MEMORY_LIMIT_EXCEEDED. #84082 (Nikita Mikhaylov).
  • Allocate the minimum amount of memory needed for encrypted_buffer for encrypted named collections. #84432 (Pablo Marcos).

Vector similarity index

  • Prevent user from using nan and inf with NumericIndexedVector. Fixes #82239 and a little more. #82681 (Raufs Dunamalijevs).
  • The vector similarity index now supports binary quantization. Binary quantization significantly reduces the memory consumption and speeds up the process of building a vector index (due to faster distance calculation). Also, the existing setting vector_search_postfilter_multiplier was made obsolete and replaced by a more general setting : vector_search_index_fetch_multiplier. #85024 (Shankar Iyer).
  • Approximate vector search with vector similarity indexes is now GA. #85888 (Robert Schulze).

Error handling and messages

  • Header Connection is sent at the end of headers. When we know is the connection should be preserved. #81951 (Sema Checherinda).
  • In previous versions, multiplication of the aggregate function state with IPv4 produced a logical error instead of a proper error code. Closes #82817. #82818 (Alexey Milovidov).
  • Better error handling in AsynchronousMetrics. If the /sys/block directory exists but is not accessible, the server will start without monitoring the block devices. Closes #79229. #83115 (Alexey Milovidov).
  • There was an incorrect dependency check for the INSERT with materialized views that have malformed selects and the user might have received an obscure std::exception instead of a meaningful error with a clear explanation. This is now fixed. This fixes: #82889. #83190 (Nikita Mikhaylov).
  • Do not output very long descriptions of expression actions in exception messages. Closes #83164. #83350 (Alexey Milovidov).
  • When the storage is shutting down, getStatus throws an ErrorCodes::ABORTED exception. Previously, this would fail the select query. Now we catch the ErrorCodes::ABORTED exceptions and intentionally ignore them instead. #83435 (Miсhael Stetsyuk).
  • Make exception messages for certain situations for loading and adding projections easier to read. #83728 (Robert Schulze).
  • Check if connection is cancelled before checking for EOF to prevent reading from closed connection. Fixes #83893. #84227 (Raufs Dunamalijevs).
  • Improved server shutdown handling for client connections by simplifying internal checks. #84312 (Raufs Dunamalijevs).
  • Low-level errors during UDF execution now fail with error code UDF_EXECUTION_FAILED, whereas previously different error codes could be returned. #84547 (Xu Jia).

SQL formatting improvements

  • Fix inconsistent formatting of CREATE DICTIONARY. Closes #82105. #82829 (Alexey Milovidov).
  • Fix inconsistent formatting of TTL when it contains a materialize function. Closes #82828. #82831 (Alexey Milovidov).
  • Fix inconsistent formatting of EXPLAIN AST in a subquery when it contains output options such as INTO OUTFILE. Closes #82826. #82840 (Alexey Milovidov).
  • Fix inconsistent formatting of parenthesized expressions with aliases in the context when no aliases are allowed. Closes #82836. Closes #82837. #82867 (Alexey Milovidov).
  • Fix formatting of CREATE USER with query parameters (i.e. CREATE USER {username:Identifier} IDENTIFIED WITH no_password). #84376 (Azat Khuzhin).
  • Fix parsing of a trailing comma in columns of the CREATE DICTIONARY query after a column with parameters, for example, Decimal(8). Closes #85586. #85653 (Nikolay Degterinsky).

External integrations

  • Unify parameter names in ODBC and JDBC when using named collections. #83410 (Andrey Zvonov).
  • MongoDB: Implicit parsing of strings to numeric types. Previously, if a string value was received from a MongoDB source for a numeric column in a ClickHouse table, an exception was thrown. Now, the engine attempts to parse the numeric value from the string automatically. Closes #81167. #84069 (Kirill Nikiforov).
  • Allow simdjson on unsupported architectures (previously leads to CANNOT_ALLOCATE_MEMORY errors). #84966 (Azat Khuzhin).

Miscellaneous improvements

  • Add Ytsaurus table engine and table function. #77606 (MikhailBurdukov).
  • Improve HashJoin::needUsedFlagsForPerRightTableRow, returns false for cross join. #82379 (lgbo).
  • Allow write/read map columns as array of tuples. #82408 (MikhailBurdukov).
  • This PR was reverted. #82884 (Mithun p).
  • Async logs: Limit the max number of entries that are hold in the queue. #83214 (Raúl Marín).
  • Enable Date/Date32 as integers in JSON input formats. #83597 (MikhailBurdukov).
  • Improved support for bloom filter indexes (regular, ngram, and token) to be utilized when the first argument is a constant array (the set) and the second is the indexed column (the subset), enabling more efficient query execution. #84700 (Doron David).
  • Allow set values type casting when pushing down IN / GLOBAL IN filters over KeyValue storage primary keys (e.g., EmbeddedRocksDB, KeeperMap). #84515 (Eduard Karacharov).
  • Eliminated full scans for the cases when index analysis results in empty ranges for parallel replicas reading. #84971 (Eduard Karacharov).
  • Fix a list of problems that can occur when trying to run integration tests on a local host. #82135 (Oleg Doronin).
  • Enable trace_log.symbolize for old deployments by default. #85456 (Azat Khuzhin).

Bug fixes (user-visible misbehavior in an official stable release)

Performance optimizations

Query execution fixes

  • For queries with combination of ORDER BY ... LIMIT BY ... LIMIT N, when ORDER BY is executed as a PartialSorting, the counter rows_before_limit_at_least now reflects the number of rows consumed by LIMIT clause instead of number of rows consumed by sorting transform. #78999 (Eduard Karacharov).
  • Fix logical error with <=> operator and Join storage, now query returns proper error code. #80165 (Vladimir Cherkasov).
  • Fix a crash in the loop function when used with the remote function family. Ensure the LIMIT clause is respected in loop(remote(...)). #80299 (Julia Kartseva).
  • Fix incorrect behavior of to_utc_timestamp and from_utc_timestamp functions when handling dates before Unix epoch (1970-01-01) and after maximum date (2106-02-07 06:28:15). Now these functions properly clamp values to epoch start and maximum date respectively. #80498 (Surya Kant Ranjan).
  • Fix IN execution with transform_null_in=1 with null in the left argument and non-nullable subquery result. #81584 (Pavel Kruglov).
  • Fix the issue where required columns are not read during scalar correlated subquery processing. Fixes #81716. #81805 (Dmitry Novik).
  • Fix filter analysis when only a constant alias column is used in the query. Fixes #79448. #82037 (Dmitry Novik).
  • Fix the Not found column error for queries with arrayJoin under WHERE condition and IndexSet. #82113 (Nikolai Kochetov).
  • Fix TOO_DEEP_SUBQUERIES exception when CTE definition references another table expression with the same name. #83413 (Dmitry Novik).
  • Fix incorrect result of queries with WHERE ... IN (<subquery>) clause and enabled query condition cache (setting use_query_condition_cache). #83445 (LB7666).
  • INSERT SELECT with UNION ALL could lead to a null pointer dereference in a corner case. This closes #83618. #83643 (Alexey Milovidov).
  • Fix LOGICAL_ERROR during row policy expression analysis for correlated columns. #82618 (Dmitry Novik).
  • Fixed wrong results when the query condition cache is used in conjunction with recursive CTEs (issue #81506). #84026 (zhongyuankai).
  • Fix infinite recursive analysis of invalid WINDOW definitions. Fixes #83131. #84242 (Dmitry Novik).
  • Fix Not-ready Set for IN (subquery) inside additional_table_filters expression setting. #85210 (Nikolai Kochetov).
  • Fix logical error with duplicate subqueries when optimize_syntax_fuse_functions is enabled, close #75511. #83300 (Vladimir Cherkasov).

Iceberg and DataLake fixes

  • Fix the metadata resolution when querying iceberg tables through rest catalog. ... #80562 (Saurabh Kumar Ojha).
  • Fix data races in Iceberg. #82088 (Azat Khuzhin).
  • Fix "Context has expired" for Iceberg. #82146 (Azat Khuzhin).
  • Now ClickHouse can read iceberg tables from Glue catalog after schema evolution. Fixes #81272. #82301 (alesapin).
  • Fix data races in Iceberg. #82841 (Azat Khuzhin).
  • Disable bounds-based file pruning for iceberg array element and iceberg map values, including all their nested subfields. #83520 (Daniil Ivanik).
  • Fix iceberg writes for complex types. #85330 (scanhex12).
  • Writing lower and upper bounds are not supported for complex types. #85332 (scanhex12).
  • Fix nullability of fields in iceberg. #85977 (scanhex12).
  • No longer create empty iceberg delete file. #86061 (scanhex12).
  • Update metadata timestamp in iceberg writes. #85711 (scanhex12).
  • Spark can't read position delete files. #85762 (scanhex12).
  • Stops taking schema from manifest files but stores relevant schemas for each snapshot independently. Infer relevant schema for each data file from its corresponding snapshot. Previous behaviour violated Iceberg specification for manifest files entries with existing status. #84588 (Daniil Ivanik).
  • Now Iceberg doesn't try to cache relevant snapshot version between select queries and always try to resolve snapshot honestly. Earlier attempt to cache iceberg snapshot led to problems with usage of Iceberg table with time travel. #85038 (Daniil Ivanik).
  • Fix the metadata resolution when querying iceberg tables through rest catalog. ... #85531 (Saurabh Kumar Ojha).
  • Fix secrets masking in icebergS3Cluster and icebergAzureCluster table functions. #85658 (MikhailBurdukov).

DeltaLake fixes

TTL and MergeTree fixes

  • Recalculate the min-max index when TTL reduces rows to ensure the correctness of algorithms relying on it, such as minmax_count_projection. This resolves #77091. #77166 (Amos Bird).
  • Fix incorrent TTL recalculation in TTL GROUP BY when updating TTL. #81222 (Evgeniy Ulasik).
  • Fix "Context has expired" during merges when dict used in TTL expression. #81690 (Azat Khuzhin).
  • Fix LOGICAL_ERROR and following crash when using the same column in the TTL for GROUP BY and SET. #82054 (Pablo Marcos).
  • MergeTree now does nothing related to TTL if all TTL are removed from the table. #84441 (alesapin).
  • Fix ALTER MODIFY ORDER BY not validating TTL columns in sorting keys. TTL columns are now properly rejected when used in ORDER BY clauses during ALTER operations, preventing potential table corruption. #84536 (xiaohuanlin).

Projection fixes

  • Fix logical error during materialize projection when column type was changed to Nullable. #80741 (Pavel Kruglov).
  • Fix incorrect usage of parent metadata in mergeTreeProjection table function when enable_shared_storage_snapshot_in_query = 1. This is for #82634. #82638 (Amos Bird).
  • Fix rare clickhouse crash when table has projection, lightweight_mutation_projection_mode = 'rebuild' and user execute lighweight delete which deletes ALL rows from any block in table. #84158 (alesapin).
  • Fix backup of parts with broken projections. #85362 (Antonio Andelic).
  • Forbid using _part_offset column in projection in releases until it is stabilized. #85372 (Sema Checherinda).

Parallel replicas fixes

  • For some queries executed with parallel replicas, reading in order optimization(s) could be applied on an initiator but not on remote nodes. THis led to different reading modes used by parallel replicas coordinator (on initiator) and on remoted nodes, which is a logical error. #80652 (Igor Nikonov).
  • Disable parallel replicas when a subquery contains FINAL #81401 by . #83455 (zoomxi).
  • Fix LOGICAL_ERROR for queries with parallel replicas and multiple INNER joins followed by RIGHT join. Do not use parallel replicas for such queries. #84299 (Vladimir Cherkasov).
  • Queries with parallel replicas which uses reading reverse in order optimization can produce incorrect result. #85406 (Igor Nikonov).

Authentication and security

  • Fix hiding named collection values in logs/query_log. Closes #82405. #82510 (Kseniia Sumarokova).
  • Set salt for auth data when parsing from AST with type SCRAM_SHA256_PASSWORD. #82888 (Tuan Pham Anh).
  • Mask Avro schema registry authentication details to be not visible to user or in logs. #83713 (János Benjamin Antal).
  • Fix incorrect behavior when executing REVOKE S3 ON system.* revokes S3 permissions for *.*. This fixes #83417. #83420 (pufit).
  • Fix server crash when a user created with no_password attempts to login after the server setting allow_no_password was changed to 0. #84426 (Shankar Iyer).
  • Improve error message on attempt to create user identified with JWT. #85072 (Konstantin Bogdanov).
  • Mask credentials for deltaLakeAzure, deltaLakeCluster, icebergS3Cluster and icebergAzureCluster. #85889 (Julian Maicher).
  • Fix a bug introduced in #79963. When inserting into a materialized view with a definer, the permission check should use the definer's grants. Fixes #79951. #83502 (pufit).

Backup and restore fixes

  • Fix backup of an empty Memory table, causing the backup restore to fail with with BACKUP_ENTRY_NOT_FOUND error. #82791 (Julia Kartseva).
  • Fix misleading error message when restoring a backup on a read-only disk. #83051 (Julia Kartseva).
  • Fix starting superfluous internal backups after connection problems. #84755 (Vitaly Baranov).

Window and aggregate functions

  • Fix possible crash in Aggregator in case of exception during merge. #81450 (Nikita Taranov).
  • Fix possible crash in Aggregator in case of exception during merge. #82022 (Nikita Taranov).
  • Fixing copy-paste error in arraySimilarity, disallowing the use of UInt32 and Int32 weights. Update tests and docs. #82103 (Mikhail f. Shiryaev).
  • Fix overflow in numericIndexedVectorPointwiseAdd, numericIndexedVectorPointwiseSubtract, numericIndexedVectorPointwiseMultiply, numericIndexedVectorPointwiseDivide functions that happened when we applied them to large numbers. #82165 (Raufs Dunamalijevs).

Parquet and file format fixes

  • Fix Parquet bloom filter incorrectly applying condition like WHERE function(key) IN (...) as if it were WHERE key IN (...). #81255 (Michael Kolupaev).
  • Fix parquet writer outputting incorrect statistics (min/max) for Decimal types. #83754 (Michael Kolupaev).
  • Fix deserialization of groupArraySample/groupArrayLast in case of empty elements (deserialization could skip part of the binary if the input was empty, this can lead to corruption during data read and UNKNOWN_PACKET_FROM_SERVER in TCP protocol). This does not affect numbers and date time types. #82763 (Pedro Ferreira).
  • Fix writing JSON paths with NULL values in RowBinary format. #83923 (Pavel Kruglov).

Join fixes

  • Fix filter modification for queries with a JOIN expression with a table with storage Merge. Fixes #82092. #82950 (Dmitry Novik).
  • Fix the crash if key-value storage is joined with a type-casted key. #82497 (Pervakov Grigorii).
  • Fix logical error when resolving matcher in query with multiple JOINs, close #81969. #82421 (Vladimir Cherkasov).
  • Fix filter merging into JOIN condition in cases when equality operands have different types or they reference constants. Fixes #83432. #84145 (Dmitry Novik).
  • Fix the logical error Expected single dictionary argument for function while doing JOIN on an inequality condition when one of the columns is LowCardinality and the other is a constant. Closes #81779. #84019 (Alexey Milovidov).

Replicated database fixes

  • Fix markReplicasActive in DDLWorker and DatabaseReplicatedDDLWorker. #81395 (Tuan Pham Anh).
  • Fix DatabaseReplicated::getClusterImpl. If the first element (or elements) of hosts has id == DROPPED_MARK and there are no other elements for the same shard, the first element of shards will be an empty vector, leading to std::out_of_range. #82093 (Miсhael Stetsyuk).
  • Keep track of the number of async tables loading jobs. If there are some running jobs, do not update tail_ptr in TransactionLog::removeOldEntries. #82824 (Tuan Pham Anh).
  • Fix the issue where, if a MergeTree table is created with add_minmax_index_for_numeric_columns=1 or add_minmax_index_for_string_columns=1, the index is later materialized during an ALTER operation, and it prevents the Replicated database from initializing correctly on a new replica. #83751 (Nikolay Degterinsky).
  • Fix the creation of RMV on a new replica of the Replicated database if DEFINER is dropped. #85327 (Nikolay Degterinsky).
  • Fix recovering replicated databases when moving the metadata file takes a long time. #85177 (Tuan Pham Anh).
  • Recover the Replicated Database forcefully after restoring the database metadata in Keeper. #85960 (Tuan Pham Anh).
  • Fix a bug in Replicated database recovery: if a table name contains the % symbol, it could re-create the table with a different name during recovery. #85987 (Alexander Tokmakov).
  • Now DDL worker cleanup outdated hosts from replicas set. It will reduce amount of stored metadata in ZooKeeper. #88154 (alesapin).

Lightweight updates fixes

  • Fix lightweight updates for tables with ReplacingMergeTree and CollapsingMergeTree engines. #84851 (Anton Popov).
  • Fix a logical error in lightweight updates that update all columns in the table. #84380 (Anton Popov).
  • Fix lightweight updates for tables with ReplicatedMergeTree engine created on servers with a version lower than 25.7. #84933 (Anton Popov).
  • Fix lightweight updates for tables with non-replicated MergeTree engine after running a ALTER TABLE ... REPLACE PARTITION query. #84941 (Anton Popov).
  • Fix cleanup of patch parts in ReplicatedMergeTree. Previously, the result of a lightweight update may temporarily not be visible on the replica until the merged or mutated part that materializes the patch parts is downloaded from another replica. #85121 (Anton Popov).

S3 and object storage fixes

DynamicAndVariantTypeFixes

Keeper fixes

  • Keeper fix: update total watch count correctly when ephemeral nodes are deleted on session close. #83583 (Antonio Andelic).
  • Fix out-of-order writes to Keeper changelog. Previously, we could have in-flight writes to changelog, but rollback could cause concurrent change of the destination file. This would lead to inconsistent logs, and possible data loss. #84434 (Antonio Andelic).
  • Fix leaks for keeper with rocksdb storage (iterators was not destroyed). #84523 (Azat Khuzhin).
  • Fix issue where Keeper setting rotate_log_storage_interval = 0 would cause ClickHouse to crash. (issue #83975). #84637 (George Larionov).
  • Fix total watches count returned by Keeper. #84890 (Antonio Andelic).
  • Lock 'mutex' when getting zookeeper from 'view' in RefreshTask. #84699 (Tuan Pham Anh).

Indexing fixes

  • Fix excessive granule skipping for filtering over token/ngram indexes with regexp which contains alternation and non-literal first alternative. #79373 (Eduard Karacharov).
  • Setting use_skip_indexes_if_final_exact_mode implementation (introduced in 25.6) could fail to select a relevant candidate range depending upon MergeTree engine settings / data distribution. That has been resolved now. #82667 (Shankar Iyer).
  • Setting use_skip_indexes_if_final_exact_mode optimization (introduced in 25.6) could fail to select a relevant candidate range depending upon MergeTree engine settings / data distribution. That has been resolved now. #82879 (Shankar Iyer).
  • Previously, set indexes didn't consider Nullable columns while checking if granules passed the filter (issue #75485). #84305 (Elmi Ahmadov).
  • The comparison against nan value was not using the correct ranges during MinMax index evaluation. #84386 (Elmi Ahmadov).
  • The ngram and no_op tokenizers no longer crash the (experimental) text index for empty input tokens. #84849 (Robert Schulze).

Materialized view fixes

Azure and cloud storage fixes

  • In AzureBlobStorage, for native copy we compare authentication methods, during which if we get an exception, updated the code to fallback to read and copy (i.e. non native copy). #82693 (Smita Kulkarni).
  • Fix double-free in AzureIteratorAsync. #85064 (Nikita Taranov).

Crash and stability fixes

  • Fix a possible crash in logging while terminating a session as the user_id might sometimes be empty. #82513 (Bharat Nallan).
  • Fix crash in client due to connection left in disconnected state after bad INSERT. #83253 (Azat Khuzhin).
  • Fix crash when calculating the size of a block with empty columns. #83271 (Raúl Marín).
  • Fix a crash that may happen for a query with a setting 'max_threads=1' when executed under load with CPU scheduling enabled. #83387 (Fan Ziqi).
  • Make zoutofmemory hardware error, otherwise it will throw logical error. see https://github.com/clickhouse/clickhouse-core-incidents/issues/877. #84420 (Han Fei).
  • Fix possible abort (due to joining threads from the task) and hopefully hangs (in unit tests) during BackgroundSchedulePool shutdown. #83769 (Azat Khuzhin).
  • Fix deadlock caused by background cancellation checker thread. #84203 (Antonio Andelic).
  • Fix deadlock on shutdown due to recursive context locking during library bridge cleanup. #83824 (Azat Khuzhin).
  • Fix crash in client due to connection left in disconnected state after bad INSERT. #83842 (Azat Khuzhin).
  • Fix possible UB (crashes) in case of MEMORY_LIMIT_EXCEEDED during String deserialization. #85440 (Azat Khuzhin).
  • Fix rare crash in asynchronous inserts that change settings log_comment or insert_deduplication_token. #85540 (Anton Popov).

Glue and catalog fixes

  • Fix bug in glue catalog integration. Now clickhouse can read tables with nested data types where some of subcolumns contain decimals, for example: map<string, decimal(9, 2)>. Fixes #81301. #82114 (alesapin).
  • ClickHouse now reads tables from Glue Catalog where table type specified in lower case. #84316 (alesapin).
  • Unity catalog now ignores schemas with weird data types in case of non-delta tables. Fixes #85699. #85950 (alesapin).

Function fixes

  • Functions trim{Left,Right,Both} now support input strings of type "FixedString(N)". For example, SELECT trimBoth(toFixedString('abc', 3), 'ac') now works. #82691 (Robert Schulze).
  • If function trim called with all-constant inputs now produces a constant output string. (Bug #78796). #82900 (Robert Schulze).
  • Fix incorrect output of function formatDateTime when formatter %f is used together with variable-size formatters (e.g. %M). #83020 (Robert Schulze).
  • Fix a bug for NULL arguments in CASE function. #82436 (Yarik Briukhovetskyi).
  • Do not use unrelated parts of a shared dictionary in the lowCardinalityKeys function. #83118 (Alexey Milovidov).
  • Fix colorSRGBToOKLCH/colorOKLCHToSRGB for mix of const and non-const args. #83906 (Azat Khuzhin).
  • Fix incorrect construction of empty tuples in the array() function. This fixes #84202. #84297 (Amos Bird).
  • Fix a bug that was causing incorrect Bech32 Encoding and Decoding. The bug wasn't caught originally due to an online implementation of the algorithm used for testing having the same issue. #84257 (George Larionov).

Distributed query fixes

  • Parallel distributed INSERT SELECT with LIMIT was allowed which is not correct, it leads to data duplication in target table. #84477 (Igor Nikonov).
  • Do not try to substitute table functions to its cluster alternative in presence of a JOIN or subquery. #84335 (Konstantin Bogdanov).
  • Add a check if a correlated subquery is used in a distributed context to avoid a crash. Fixes #82205. #85030 (Dmitry Novik).
  • Using distributed_depth as an indicator of *Cluster function was incorrect and may lead to data duplication; use client_info.collaborate_with_initiator instead. #85734 (Konstantin Bogdanov).
  • Support global constants from WITH statement for the parallel distributed INSERT SELECT with the Distributed destination table. Before, the query could throw an Unknown expression identifier error. #85811 (Nikolai Kochetov).
  • Fix logical error on attempt to CREATE ... AS (SELECT * FROM s3Cluster(...)) with DatabaseReplicated. #85904 (Konstantin Bogdanov).
  • Add checks for sharding_key during ALTER of the Distributed table. Previously incorrect ALTER would break the table definition and server restart. #86015 (Nikolay Degterinsky).

Metrics and monitoring fixes

  • Fix the validation of async metrics settings asynchronous_metrics_update_period_s and asynchronous_heavy_metrics_update_period_s. #82310 (Bharat Nallan).
  • Fix IndexUncompressedCacheBytes/IndexUncompressedCacheCells/IndexMarkCacheBytes/IndexMarkCacheFiles metrics (previously they were included into metric w/o Cache prefix). #83730 (Azat Khuzhin).
  • Fix LOGICAL_ERROR in QueryMetricLog: Mutex cannot be NULL. #82979 (Pablo Marcos).
  • Fix incorrect metrics KafkaAssignedPartitions and KafkaConsumersWithAssignment. #85494 (Ilya Golshtein).
  • Fix processed bytes stat being underestimated when PREWHERE (explicit or automatic) is used. #85495 (Michael Kolupaev).
  • Fix memory tracking drift from background schedule pool and executor. #84946 (Azat Khuzhin).

Data type and conversion fixes

  • Fix cases where parsing of Time could cause msan issues. This fixes: #82477. #82514 (Yarik Briukhovetskyi).
  • Fix sort of NaN values in LowCardinality(Float32|Float64|BFloat16) type. #83786 (Pervakov Grigorii).
  • Overflow large values (>2106-02-07) when casting from Date to DateTime64 is fixed. #83982 (Yarik Briukhovetskyi).
  • Fix issue with implicit reading of negative Time values into the table and make the docs not confusing. #83091 (Yarik Briukhovetskyi).
  • Codec DoubleDelta codec can now only be applied to columns of numeric type. In particular FixedString columns can no longer be compressed using DoubleDelta. (fixes #80220). #84383 (Jimmy Aguilar Mena).
  • Fix precision loss in JSONExtract when converting JSON numbers to Decimal types. Now numeric JSON values preserve their exact decimal representation, avoiding floating-point rounding errors. #85665 (ssive7b).

Memory and resource management

  • Fix incorrect memory around max_untracked_memory. #83607 (Azat Khuzhin).
  • Do not share async_read_counters between queries. #83423 (Azat Khuzhin).
  • Fix possible file cache not initialized errors when it's used as a temporary data storage. #83539 (Bharat Nallan).
  • Always apply filesystem_prefetches_limit (not only from MergeTreePrefetchedReadPool). #83999 (Azat Khuzhin).

Configuration and settings fixes

  • When passing settings over URI the last value is considered. #82137 (Sema Checherinda).
  • Fix data-races in client (by not using global context) and session_timezone overrides (previously in case of session_timezone was set in i.e. users.xml/client options to non empty and in query context to empty, then, value from users.xml was used, while this is wrong, now query context will always have a priority over global context). #82444 (Azat Khuzhin).
  • Disallow setting threadpool_writer_pool_size to zero to ensure that server operations don't get stuck. #82532 (Bharat Nallan).
  • Resolve minor integer overflow in configuration of setting role_cache_expiration_time_seconds (issue #83374). #83461 (wushap).
  • Disallow zero value for max_insert_block_size as it could cause logical error. #83688 (Bharat Nallan).
  • Fix endless loop in estimateCompressionRatio() with block_size_bytes=0. #83704 (Azat Khuzhin).
  • Parameters like date_time_input_format have been just ignored when http with multipart. #85570 (Sema Checherinda).

MongoDB fixes

  • Previously, MongoDB table engine definitions could include a path component in the host:port argument which was silently ignored. The mongodb integration refuses to load such tables. With this fix we allow loading such tables and ignore path component if MongoDB engine has five arguments, using the database name from arguments. Note: The fix is not applied for newly created tables or queries with mongo table function, as well as for dictionary sources and named collections. #81942 (Vladimir Cherkasov).

Miscellaneous fixes

  • In previous versions, the server returned excessive content for requests to /js. This closes #61890. #81895 (Alexey Milovidov).
  • Fix InterpreterInsertQuery::extendQueryLogElemImpl to add backquotes to database and table names when needed (f.g., when names contain special characters like -). #81528 (Ilia Shvyrialkin).
  • Fix possible data-race between suggestion thread and main client thread. #82233 (Azat Khuzhin).
  • Fix exception safety in union/intersect/except_default_mode rewrite. Closes #82664. #82820 (Alexey Milovidov).
  • When using a non-caching Database implementation, the metadata of the corresponding table is deleted after the columns are returned and the reference is invalidated. #82939 (buyval01).
  • Onprogress call in JSONEachRowWithProgress is synchronized with finalization. #83879 (Sema Checherinda).
  • Fix rare bug when MATERIALIZE COLUMN query could lead to unexpected files in checksums.txt and eventually detached data parts. #84007 (alesapin).
  • Handle exceptions properly in periodic parts refresh. #84083 (Azat Khuzhin).
  • Fix column name generation for boolean literals to use "true"/"false" instead of "1"/"0", preventing column name conflicts between boolean and integer literals in queries. #84945 (xiaohuanlin).
  • Fix potential inaccurate sorting issues in the Merge table engine. #85025 (Xiaozhe Yu).
  • Implement missing APIs for DiskEncrypted. #85028 (Azat Khuzhin).
  • Introduce backward compatibility setting to allow new analyzer to reference outer alias in the WITH clause in the case of name clashes. Fixes #82700. #83797 (Dmitry Novik).
  • Allow referencing any table in view(...) argument of remote table function with enabled analyzer. Fixes #78717. Fixes #79377. #83844 (Dmitry Novik).
  • Fix write with append (in MergeTree used for experimental transactions) with plain_rewritable/plain metadata types, previously they were simply ignored. #83695 (Tuan Pham Anh).
  • Fix logger usage in IAccessStorage. #84365 (Konstantin Bogdanov).
  • Fix pruning files by virtual column in data lakes. #84520 (Kseniia Sumarokova).
  • Fix issue where querying a delayed remote source could result in vector out of bounds. #84820 (George Larionov).
  • Correctly store all settings in table metadata for object queue engine. #84860 (Antonio Andelic).
  • Fix the CORRUPTED_DATA error when lazy columns are used with external sort. #84738 (János Benjamin Antal).
  • Get rid of unnecessary getStatus() calls during SYSTEM DROP REPLICA queries. Fixes the case when a table is dropped in the background, and the Shutdown for storage is called exception is thrown. #85220 (Nikolay Degterinsky).
  • Add missing table name length checks in CREATE OR REPLACE and RENAME queries. #85326 (Michael Kolupaev).
  • Fix crash and data corruption during ALTER UPDATE for JSON. #85383 (Pavel Kruglov).
  • Fix coalescing merge tree segfault for large strings. This closes #84582. #85709 (scanhex12).
  • Fix send_logs_source_regexp (after async logging refactoring in #85105). #85797 (Azat Khuzhin).
  • Fix possible inconsistency for dictionaries with update_field on MEMORY_LIMIT_EXCEEDED errors. #85807 (Azat Khuzhin).
  • Fix HTTP requests made by the url() table function to properly include port numbers in the Host header when accessing non-standard ports. This resolves authentication failures when using presigned URLs with S3-compatible services like MinIO running on custom ports, which is common in development environments. (Fixes #85898). #85921 (Tom Quist).