ClickHouse Newsletter February 2022: Do you know how to search a table?
Happy (Lunar) New Year! 2021 was a big year for us; we founded a company and brought you 12 new releases. Looking to catch up now that we’re all back from the holiday season?
Our favorite features are below, along with an invite to this month’s webinar, this month’s query tips, and more. Welcome to your monthly dose of ClickHouse goodness.
Top 5 New Features in ClickHouse from 2021
Across 12 monthly releases, the ClickHouse team and our community of amazing contributors have released hundreds of features and improvements. These are our favorites:
- ClickHouse Keeper, our alternative to Apache ZooKeeper, is feature complete! Now ClickHouse is a “true” single binary and you don’t need anything else to get going. Keeper is fully protocol-compatible with the original ZooKeeper — so you can also use it for your Hadoop or Kafka deployments.
- Asynchronous insert mode now makes it possible to make many small INSERTs to your ClickHouse cluster without running into the dreaded “Too many parts” error. Turn it on using the
- User-defined functions (UDF) allows you to extend ClickHouse with any extra functionality you need! You can define UDFs as lambda expressions or call external scripts in any programming language. The latter makes it especially useful for any ML/AI/NLP use cases you have. Check out the tutorial here.
- New data types: Map, Bool, UInt128 make it even easier to ingest all your data into ClickHouse. And the
Nesteddata type now supports arbitrary levels of nesting.
- Positional arguments are now supported! A small but useful feature for users familiar with other databases, you can turn it on with the
For more information about what we’ve added to your favorite database in the last year, catch up on our blog, watch the recordings of our release webinars on our YouTube channel, or get into the (deep) details by checking out the changelog.
Upcoming Release v22.2
Our next monthly release is just around the corner! We’re expecting to add text classification functions and flexible memory limits. As always, we’ll be hosting a release webinar where you’ll have the opportunity to ask and get answers to your questions live.
When: 9 a.m. PST / 5:00 p.m. GMT, February 17th
Query of the Month: Full-text search a table
Ever wondered how you can search all the columns of a table at once, like with a search engine? You can do it with ClickHouse! There are a few options and they use some neat features in ClickHouse that you might not (yet) know.
SELECT * FROM hackernews WHERE formatRow('TSV', *) ILIKE '%i love clickhouse%'
This query uses the
formatRow function to concatenate all columns into a long tab-separated string and searches through it using
SELECT * FROM hackernews WHERE concat(* APPLY x -> concat(toString(x), '<<>>')) ILIKE '%i love clickhouse%'
In this query, we also concatenate all columns, but instead of using tabs as separators we can specify the separator ourselves, in this case
APPLY is a wonderful modifier that allows us to invoke a function (in this case, a higher-order lambda expression) on any expression (all columns in this case, but we could have specified a subset of columns using the handily named COLUMNS expression).
SELECT * FROM hackernews WHERE (arrayExists(x -> positionCaseInsensitiveUTF8(x, 'i love clickhouse') > 0, array(* APPLY x -> toString(x))))
Have you noticed what can happen when using the first two options? Because all columns are simply concatenated into one big string it is possible to search across columns if one (knowingly or accidentally) searches for the column separator. The last query here eliminates the problem by gathering all columns into an array instead and using the
arrayExists function to check if there is any one element that contains the phrase we are searching for.
When running any of these queries on the Hacker News dataset they all find the comment “I love clickhouse: it’s simple yet flexible enough and free software.” Aw shucks, we love ClickHouse too!
Have you written an interesting query recently that you think others could learn from? Let us know on twitter, we’d love to hear from you!
What we’ve been reading at the start of the year:
- What’s New in ClickHouse 22.1 – our v22.1 release blog post, announcing automatic schema inference, parallel query processing on multiple replicas, a new diagnostic tool for your ClickHouse and more!
- Admixer Aggregates Over 1 Billion Unique Users a Day using ClickHouse – Adtech platform Admixer moved from MSSQL and Azure Table Storage to ClickHouse and is ingesting over 1M rows per second.
- Migrating Your Reporting Queries From MongoDB to ClickHouse – Benchmarking MongoDB versus ClickHouse for analytics queries. Spoiler alert: ClickHouse is faster! 😉
- Historical Traffic Analysis at Scale: Using ClickHouse with ntopng – high-speed flow collection and storage with ClickHouse.
- How ClickHouse, Inc. Is Building a Best-in-Class Engineering Culture – learn a bit about how we are building our engineering team at ClickHouse from our very own Yury Izrailevsky. (Spoiler: It is not about telling people what to do every day.)
New ClickHouse Adopters: ntop, Superwall, Muse, and NLMK. Get yourself added as well!
Thanks for reading. We’ll see you next month!
The ClickHouse Team
Would you like to keep in touch?
Sign up for our newsletter to stay informed on feature releases, our product roadmap, and our cloud and support offerings.