Validate suspicious/experimental types in nested types. Previously we didn't validate such types (except JSON) in nested types like Array/Tuple/Map. #59385 (Kruglov Pavel).
The sort clause ORDER BY ALL (introduced with v23.12) is replaced by ORDER BY *. The previous syntax was too error-prone for tables with a column all. #59450 (Robert Schulze).
Add sanity check for number of threads and block sizes. #60138 (Raúl Marín).
To increase compatibility with MySQL, function locate now accepts arguments (needle, haystack[, start_pos]) by default. The previous behavior (haystack, needle, [, start_pos]) can be restored by setting function_locate_has_mysql_compatible_argument_order = 0. #61092 (Robert Schulze).
The obsolete in-memory data parts have been deprecated since version 23.5 and have not been supported since version 23.10. Now the remaining code is removed. Continuation of #55186 and #45409. It is unlikely that you have used in-memory data parts because they were available only before version 23.5 and only when you enabled them manually by specifying the corresponding SETTINGS for a MergeTree table. To check if you have in-memory data parts, run the following query: SELECT part_type, count() FROM system.parts GROUP BY part_type ORDER BY part_type. To disable the usage of in-memory data parts, do ALTER TABLE ... MODIFY SETTING min_bytes_for_compact_part = DEFAULT, min_rows_for_compact_part = DEFAULT. Before upgrading from old ClickHouse releases, first check that you don't have in-memory data parts. If there are in-memory data parts, disable them first, then wait while there are no in-memory data parts and continue the upgrade. #61127 (Alexey Milovidov).
Forbid SimpleAggregateFunction in ORDER BY of MergeTree tables (like AggregateFunction is forbidden, but they are forbidden because they are not comparable) by default (use allow_suspicious_primary_key to allow them). #61399 (Azat Khuzhin).
ClickHouse allows arbitrary binary data in the String data type, which is typically UTF-8. Parquet/ORC/Arrow Strings only support UTF-8. That's why you can choose which Arrow's data type to use for the ClickHouse String data type - String or Binary. This is controlled by the settings, output_format_parquet_string_as_string, output_format_orc_string_as_string, output_format_arrow_string_as_string. While Binary would be more correct and compatible, using String by default will correspond to user expectations in most cases. Parquet/ORC/Arrow supports many compression methods, including lz4 and zstd. ClickHouse supports each and every compression method. Some inferior tools lack support for the faster lz4 compression method, that's why we set zstd by default. This is controlled by the settings output_format_parquet_compression_method, output_format_orc_compression_method, and output_format_arrow_compression_method. We changed the default to zstd for Parquet and ORC, but not Arrow (it is emphasized for low-level usages). #61817 (Alexey Milovidov).
Fix for the materialized view security issue, which allowed a user to insert into a table without required grants for that. Fix validates that the user has permission to insert not only into a materialized view but also into all underlying tables. This means that some queries, which worked before, now can fail with Not enough privileges. To address this problem, the release introduces a new feature of SQL security for views https://clickhouse.com/docs/en/sql-reference/statements/create/view#sql_security. #54901 (pufit)
Topk/topkweighed support mode, which return count of values and it's error. #54508 (UnamedRus).
Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. #54901 (pufit).
Implemented automatic conversion of merge tree tables of different kinds to replicated engine. Create empty convert_to_replicated file in table's data directory (/clickhouse/store/xxx/xxxyyyyy-yyyy-yyyy-yyyy-yyyyyyyyyyyy/) and that table will be converted automatically on next server start. #57798 (Kirill).
Added table function mergeTreeIndex. It represents the contents of index and marks files of MergeTree tables. It can be used for introspection. Syntax: mergeTreeIndex(database, table, [with_marks = true]) where database.table is an existing table with MergeTree engine. #58140 (Anton Popov).
Try to detect file format automatically during schema inference if it's unknown in file/s3/hdfs/url/azureBlobStorage engines. Closes #50576. #59092 (Kruglov Pavel).
Add generate_series as a table function. This function generates table with an arithmetic progression with natural numbers. #59390 (divanik).
Added query ALTER TABLE table FORGET PARTITION partition that removes ZooKeeper nodes, related to an empty partition. #59507 (Sergei Trifonov).
Support reading and writing backups as tar archives. #59535 (josh-hildred).
The codec LZ4HC will accept a new level 2, which is faster than the previous minimum level 3, at the expense of less compression. In previous versions, LZ4HC(2) and less was the same as LZ4HC(3). Author: Cyan4973. #60090 (Alexey Milovidov).
Implemented system.dns_cache table, which can be useful for debugging DNS issues. New server setting dns_cache_max_size. #60257 (Kirill Nikiforov).
Added function toMillisecond which returns the millisecond component for values of typeDateTime or DateTime64. #60281 (Shaun Struwig).
Support single-argument version for the merge table function, as merge(['db_name', ] 'tables_regexp'). #60372 (豪肥肥).
Make all format names case insensitive, like Tsv, or TSV, or tsv, or even rowbinary. #60420 (豪肥肥).
Added new syntax which allows to specify definer user in View/Materialized View. This allows to execute selects/inserts from views without explicit grants for underlying tables. #60439 (pufit).
Add four properties to the StorageMemory (memory-engine) min_bytes_to_keep, max_bytes_to_keep, min_rows_to_keep and max_rows_to_keep - Add tests to reflect new changes - Update memory.md documentation - Add table context property to MemorySink to enable access to table parameter bounds. #60612 (Jake Bamrah).
Added function toMillisecond which returns the millisecond component for values of typeDateTime or DateTime64. #60649 (Robert Schulze).
Separate limits on number of waiting and executing queries. Added new server setting max_waiting_queries that limits the number of queries waiting due to async_load_databases. Existing limits on number of executing queries no longer count waiting queries. #61053 (Sergei Trifonov).
Eliminates min/max/any/anyLast aggregators of GROUP BY keys in SELECT section. #52230 (JackyWoo).
Improve the performance of serialized aggregation method when involving multiple [nullable] columns. This is a general version of #51399 that doesn't compromise on abstraction integrity. #55809 (Amos Bird).
Lazy build join output to improve performance of ALL join. #58278 (LiuNeng).
Improvements to aggregate functions ArgMin / ArgMax / any / anyLast / anyHeavy, as well as ORDER BY {u8/u16/u32/u64/i8/i16/u32/i64) LIMIT 1 queries. #58640 (Raúl Marín).
Optimize performance of sum/avg conditionally for bigint and big decimal types by reducing branch miss. #59504 (李扬).
Improve performance of SELECTs with active mutations. #59531 (Azat Khuzhin).
Trivial optimize on column filter. Avoid those filter columns whoes underlying data type is not number being filtered with result_size_hint = -1. Peak memory can be reduced to 44% of the original in some cases. #59698 (李扬).
Improve memory usage for primary key and some other operations. #60050 (Alexey Milovidov).
The tables' primary keys will be loaded in memory lazily on first access. This is controlled by the new MergeTree setting primary_key_lazy_load, which is on by default. This provides several advantages: - it will not be loaded for tables that are not used; - if there is not enough memory, an exception will be thrown on first use instead of at server startup. This provides several disadvantages: - the latency of loading the primary key will be paid on the first query rather than before accepting connections; this theoretically may introduce a thundering-herd problem. This closes #11188. #60093 (Alexey Milovidov).
Vectorized function dotProduct which is useful for vector search. #60202 (Robert Schulze).
If the table's primary key contains mostly useless columns, don't keep them in memory. This is controlled by a new setting primary_key_ratio_of_unique_prefix_values_to_skip_suffix_columns with the value 0.9 by default, which means: for a composite primary key, if a column changes its value for at least 0.9 of all the times, the next columns after it will be not loaded. #60255 (Alexey Milovidov).
Execute multiIf function columnarly when result_type's underlying type is number. #60384 (李扬).
As is shown in Fig 1, the replacement of "&&" with "&" could generate the SIMD code. Fig 1. Code compiled from '&&' (left) and '&' (right). #60498 (Zhiguo Zhou).
Faster (almost 2x) mutexes (was slower due to ThreadFuzzer). #60823 (Azat Khuzhin).
Move connection drain from prepare to work, and drain multiple connections in parallel. #60845 (lizhuoyu5).
Optimize insertManyFrom of nullable number or nullable string. #60846 (李扬).
Optimized function dotProduct to omit unnecessary and expensive memory copies. #60928 (Robert Schulze).
Operations with the filesystem cache will suffer less from the lock contention. #61066 (Alexey Milovidov).
Optimize ColumnString::replicate and prevent memcpySmallAllowReadWriteOverflow15Impl from being optimized to built-in memcpy. Close #61074. ColumnString::replicate speeds up by 2.46x on x86-64. #61075 (李扬).
If a query with a syntax error contained COLUMNS matcher with a regular expression, the regular expression was compiled each time during the parser's backtracking, instead of being compiled once. This was a fundamental error. The compiled regexp was put to AST. But the letter A in AST means "abstract" which means it should not contain heavyweight objects. Parts of AST can be created and discarded during parsing, including a large number of backtracking. This leads to slowness on the parsing side and consequently allows DoS by a readonly user. But the main problem is that it prevents progress in fuzzers. #61543 (Alexey Milovidov).
While running the MODIFY COLUMN query for materialized views, check the inner table's structure to ensure every column exists. #47427 (sunny).
Added table system.keywords which contains all the keywords from parser. Mostly needed and will be used for better fuzzing and syntax highlighting. #51808 (Nikita Mikhaylov).
Added support for parameterized view with analyzer to not analyze create parameterized view. Refactor existing parameterized view logic to not analyze create parameterized view. #54211 (SmitaRKulkarni).
Ordinary database engine is deprecated. You will receive a warning in clickhouse-client if your server is using it. This closes #52229. #56942 (shabroo).
All zero copy locks related to a table have to be dropped when the table is dropped. The directory which contains these locks has to be removed also. #57575 (Sema Checherinda).
Add short-circuit ability for dictGetOrDefault function. Closes #52098. #57767 (jsc0218).
Running ALTER COLUMN MATERIALIZE on a column with DEFAULT or MATERIALIZED expression now writes the correct values: The default value for existing parts with default value or the non-default value for existing parts with non-default value. Previously, the default value was written for all existing parts. #58023 (Duc Canh Le).
Enabled a backoff logic (e.g. exponential). Will provide an ability for reduced CPU usage, memory usage and log file sizes. #58036 (MikhailBurdukov).
Consider lightweight deleted rows when selecting parts to merge. #58223 (Zhuo Qiu).
Allow to define volume_priority in storage_configuration. #58533 (Andrey Zvonov).
Now we can use virtual columns in PREWHERE. It's worthwhile for non-const virtual columns like _part_offset. #59033 (Amos Bird).
Settings for the Distributed table engine can now be specified in the server configuration file (similar to MergeTree settings), e.g. <distributed> <flush_on_detach>false</flush_on_detach> </distributed>. #59291 (Azat Khuzhin).
Keeper improvement: cache only a certain amount of logs in-memory controlled by latest_logs_cache_size_threshold and commit_logs_cache_size_threshold. #59460 (Antonio Andelic).
Instead using a constant key, now object storage generates key for determining remove objects capability. #59495 (Sema Checherinda).
Don't infer floats in exponential notation by default. Add a setting input_format_try_infer_exponent_floats that will restore previous behaviour (disabled by default). Closes #59476. #59500 (Kruglov Pavel).
Allow alter operations to be surrounded by parenthesis. The emission of parentheses can be controlled by the format_alter_operations_with_parentheses config. By default in formatted queries the parentheses are emitted as we store the formatted alter operations in some places as metadata (e.g.: mutations). The new syntax clarifies some of the queries where alter operations end in a list. E.g.: ALTER TABLE x MODIFY TTL date GROUP BY a, b, DROP COLUMN c cannot be parsed properly with the old syntax. In the new syntax the query ALTER TABLE x (MODIFY TTL date GROUP BY a, b), (DROP COLUMN c) is obvious. Older versions are not able to read the new syntax, therefore using the new syntax might cause issues if newer and older version of ClickHouse are mixed in a single cluster. #59532 (János Benjamin Antal).
Bumped Intel QPL (used by codec DEFLATE_QPL) from v1.3.1 to v1.4.0 . Also fixed a bug for polling timeout mechanism, as we observed in same cases timeout won't work properly, if timeout happen, IAA and CPU may process buffer concurrently. So far, we'd better make sure IAA codec status is not QPL_STS_BEING_PROCESSED, then fallback to SW codec. #59551 (jasperzhu).
Add positional pread in libhdfs3. If you want to call positional read in libhdfs3, use the hdfsPread function in hdfs.h as follows. tSize hdfsPread(hdfsFS fs, hdfsFile file, void * buffer, tSize length, tOffset position);. #59624 (M1eyu).
Check for stack overflow in parsers even if the user misconfigured the max_parser_depth setting to a very high value. This closes #59622. #59697 (Alexey Milovidov).
Unify xml and sql created named collection behaviour in kafka storage. #59710 (Pervakov Grigorii).
Allow uuid in replica_path if CREATE TABLE explicitly has it. #59908 (Azat Khuzhin).
Add column metadata_version of ReplicatedMergeTree table in system.tables system table. #59942 (Maksim Kita).
Keeper improvement: add retries on failures for Disk related operations. #59980 (Antonio Andelic).
Use multiple threads while reading the metadata of tables from a backup while executing the RESTORE command. #60040 (Vitaly Baranov).
Now if StorageBuffer has more than 1 shard (num_layers > 1) background flush will happen simultaneously for all shards in multiple threads. #60111 (alesapin).
Support specifying users for specific S3 settings in config using user key. #60144 (Antonio Andelic).
Copy S3 file GCP fallback to buffer copy in case GCP returned Internal Error with GATEWAY_TIMEOUT HTTP error code. #60164 (Maksim Kita).
Allow "local" as object storage type instead of "local_blob_storage". #60165 (Kseniia Sumarokova).
Implement comparison operator for Variant values and proper Field inserting into Variant column. Don't allow creating Variant type with similar variant types by default (allow uder a setting allow_suspicious_variant_types) Closes #59996. Closes #59850. #60198 (Kruglov Pavel).
Improved overall usability of virtual columns. Now it is allowed to use virtual columns in PREWHERE (it's worthwhile for non-const virtual columns like _part_offset). Now a builtin documentation is available for virtual columns as a comment of column in DESCRIBE query with enabled setting describe_include_virtual_columns. #60205 (Anton Popov).
Short circuit execution for ULIDStringToDateTime. #60211 (Juan Madurga).
Added query_id column for tables system.backups and system.backup_log. Added error stacktrace to error column. #60220 (Maksim Kita).
Parallel flush of pending INSERT blocks of Distributed engine on DETACH/server shutdown and SYSTEM FLUSH DISTRIBUTED (Parallelism will work only if you have multi disk policy for table (like everything in Distributed engine right now)). #60225 (Azat Khuzhin).
Filter setting is improper in joinRightColumnsSwitchNullability, resolve #59625. #60259 (lgbo).
Issue #57598 mentions a variant behaviour regarding transaction handling. An issued COMMIT/ROLLBACK when no transaction is active is reported as an error contrary to MySQL behaviour. #60338 (PapaToemmsn).
Added none_only_active mode for distributed_ddl_output_mode setting. #60340 (Alexander Tokmakov).
Connections through the MySQL port now automatically run with setting prefer_column_name_to_alias = 1 to support QuickSight out-of-the-box. Also, settings mysql_map_string_to_text_in_show_columns and mysql_map_fixed_string_to_text_in_show_columns are now enabled by default, affecting also only MySQL connections. This increases compatibility with more BI tools. #60365 (Robert Schulze).
When output format is Pretty format and a block consists of a single numeric value which exceeds one million, A readable number will be printed on table right. e.g. ┌──────count()─┐ │ 233765663884 │ -- 233.77 billion └──────────────┘. #60379 (rogeryk).
Allow configuring HTTP redirect handlers for clickhouse-server. For example, you can make / redirect to the Play UI. #60390 (Alexey Milovidov).
The advanced dashboard has slightly better colors for multi-line graphs. #60391 (Alexey Milovidov).
Fix a race condition in JavaScript code leading to duplicate charts on top of each other. #60392 (Alexey Milovidov).
Check for stack overflow in parsers even if the user misconfigured the max_parser_depth setting to a very high value. This closes #59622. #60434 (Alexey Milovidov).
Renamed server setting dns_cache_max_size to dns_cache_max_entries to reduce ambiguity. #60500 (Kirill Nikiforov).
SHOW INDEX | INDEXES | INDICES | KEYS no longer sorts by the primary key columns (which was unintuitive). #60514 (Robert Schulze).
Keeper improvement: abort during startup if an invalid snapshot is detected to avoid data loss. #60537 (Antonio Andelic).
Added MergeTree read split ranges into intersecting and non intersecting fault injection using merge_tree_read_split_ranges_into_intersecting_and_non_intersecting_fault_probability setting. #60548 (Maksim Kita).
The Advanced dashboard now has controls always visible on scrolling. This allows you to add a new chart without scrolling up. #60692 (Alexey Milovidov).
String types and Enums can be used in the same context, such as: arrays, UNION queries, conditional expressions. This closes #60726. #60727 (Alexey Milovidov).
Support files without format extension in Filesystem database. #60795 (Kruglov Pavel).
Keeper improvement: support leadership_expiry_ms in Keeper's settings. #60806 (Brokenice0415).
Always infer exponential numbers in JSON formats regardless of the setting input_format_try_infer_exponent_floats. Add setting input_format_json_use_string_type_for_ambiguous_paths_in_named_tuples_inference_from_objects that allows to use String type for ambiguous paths instead of an exception during named Tuples inference from JSON objects. #60808 (Kruglov Pavel).
Add a flag for SMJ to treat null as biggest/smallest. So the behavior can be compitable with other SQL systems, like Apache Spark. #60896 (loudongfeng).
Enable processors profiling (time spent/in and out bytes for sorting, aggregation, ...) by default. #61096 (Azat Khuzhin).
Add the function toUInt128OrZero, which was missed by mistake (the mistake is related to https://github.com/ClickHouse/ClickHouse/pull/945). The compatibility aliases FROM_UNIXTIME and DATE_FORMAT (they are not ClickHouse-native and only exist for MySQL compatibility) have been made case insensitive, as expected for SQL-compatibility aliases. #61114 (Alexey Milovidov).
Improvements for the access checks, allowing to revoke of unpossessed rights in case the target user doesn't have the revoking grants either. Example: ```sql GRANT SELECT ON . TO user1; REVOKE SELECT ON system.* FROM user1;. #61115 (pufit).
Now it's possible to specify attribute merge="true" in config substitutions for subtrees <include from_zk="/path" merge="true">. In case this attribute specified, clickhouse will merge subtree with existing configuration, otherwise default behavior is append new content to configuration. #61299 (alesapin).
Use temporary_files_codec setting in all places where we create temporary data, for example external memory sorting and external memory GROUP BY. Before it worked only in partial_merge JOIN algorithm. #61456 (Maksim Kita).
Use managed identity for backups IO when using Azure Blob Storage. Add a setting to prevent ClickHouse from attempting to create a non-existent container, which requires permissions at the storage account level. #61785 (Daniel Pozo Escalona).
In the previous version, some numbers in Pretty formats were not pretty enough. #61794 (Alexey Milovidov).
A long value in Pretty formats won't be cut if it is the single value in the resultset, such as in the result of the SHOW CREATE TABLE query. #61795 (Alexey Milovidov).
Similarly to clickhouse-local, clickhouse-client will accept the --output-format option as a synonym to the --format option. This closes #59848. #61797 (Alexey Milovidov).
If stdout is a terminal and the output format is not specified, clickhouse-client and similar tools will use PrettyCompact by default, similarly to the interactive mode. clickhouse-client and clickhouse-local will handle command line arguments for input and output formats in a unified fashion. This closes #61272. #61800 (Alexey Milovidov).
Underscore digit groups in Pretty formats for better readability. This is controlled by a new setting, output_format_pretty_highlight_digit_groups. #61802 (Alexey Milovidov).
Bug Fix (user-visible misbehavior in an official stable release)