The SparseGrams function
Mark Needham
Learn about ClickHouse's sparse grams function and how it improves upon traditional n-grams to build better search solutions. This tutorial walks through the concept step-by-step, explaining how sparse grams work by using weighted substrings to filter out common patterns that would otherwise return too many search results. We'll explore the algorithm with practical examples and show you how to use it in ClickHouse.
What You'll Learn:
- The limitations of traditional n-grams for search indexing at scale
- How GitHub's sparse grams algorithm solves the "too many results" problem
- Step-by-step walkthrough of the sparse grams weighting system
- How to use ClickHouse's sparseGrams() function with practical examples
- Understanding the crc32 hash function for weight calculations
- Comparing n-grams vs sparse grams output side-by-side

Scaling ClickHouse to petabytes of logs at OpenAI

How ClickHouse helps Anthropic scale observability

How Capital One Slingshot cut infrastructure costs by 50%
Engineering leaders at Capital One Slingshot share how they cut infrastructure costs by 50% and reduced average dashboard load time from 5+ to under 500ms with ClickHouse Cloud.