- Resources
- Engineering
ClickHouse engineering resources
How to scale vector search in Postgres (pgvector) for RAG and AI agents: memory limits, filtering, and when to go hybrid
Scale vector search in Postgres with pgvector: avoid HNSW RAM limits, fix filtering recall, and know when to go hybrid. Read now.
Last updated: Jun 18, 2026
How to query a REST API in Python
Read a JSON API response into a DataFrame with chDB. Use the pandas API you already know to filter and aggregate the response, running on ClickHouse's engine with no server to start.
Last updated: Jun 15, 2026
How to convert Parquet to ORC
Convert a Parquet file to ORC with one clickhouse-local command. The schema is read from the Parquet footer and the types carry into ORC, with no server and no upload.
Last updated: Jun 6, 2026
How to query a REST API with SQL
Run SQL straight against a live JSON or CSV HTTP endpoint with clickhouse local — fetch, filter, aggregate and join the response in one statement, with no download step.
Last updated: Jun 15, 2026
Parse url-encoded form data with SQL
Turn an application/x-www-form-urlencoded body like a=1&b=hello into a queryable row with clickhouse local, with no parsing code and no import step.
Last updated: Jun 15, 2026
What is Protobuf?
Protobuf (Protocol Buffers) is a schema-first, compact binary serialization format. This page explains its structure, the .proto schema requirement, and proves it by writing and reading a real Protobuf file.
Last updated: Jun 15, 2026
Run SQL across multiple CSV or Parquet files
Query a whole directory of CSV or Parquet files as one table with clickhouse-local. A glob path reads them all, aggregates run across every file, and a virtual column tells you which file each row came from.
Last updated: Jun 15, 2026
Read Avro from a schema registry with clickhouse-local
Read a plain .avro file with one command, and use the AvroConfluent format plus format_avro_schema_registry_url to read Confluent-framed Avro messages from a Kafka Schema Registry.
Last updated: Jun 15, 2026
How to query a mysqldump file without importing
Run SQL against the tables inside a mysqldump .sql file directly with clickhouse-local. No MySQL server, no restore step — the INSERT statements are read as a table.
Last updated: Jun 15, 2026
How to convert Avro to JSON
Convert an Avro file to JSON with one command using clickhouse-local. The schema is read from the Avro file itself, types are carried into JSON, and there is no upload and no server.
Last updated: Jun 15, 2026
How to read a RowBinary file
Read a ClickHouse RowBinary file from your terminal with one command, using clickhouse local. RowBinaryWithNamesAndTypes is self-describing; plain RowBinary needs an explicit structure.
Last updated: Jun 15, 2026
How to read a ClickHouse Native format file
Open and query a ClickHouse Native (.native) file from your terminal with one command, using clickhouse local. The file carries its own schema, so reads need no types and lose no precision.
Last updated: Jun 15, 2026
How to convert TSV to JSON
Convert a TSV file to JSON from your terminal with clickhouse-local — the header becomes the keys and types are inferred, with no upload and no server.
Last updated: Jun 15, 2026
How to convert NDJSON to CSV
Convert a line-delimited NDJSON file to CSV from your terminal with clickhouse-local — schema auto-inferred, nested objects flattened, no server and no upload.
Last updated: Jun 15, 2026
How to convert JSONL to JSON
Turn a line-delimited JSONL file into a single JSON array from your terminal with clickhouse-local. Schema auto-inferred, types preserved, no server and no upload.
Last updated: Jun 15, 2026
How to convert JSON to TSV
Convert a JSON file to a tab-separated file from your terminal with clickhouse-local — schema auto-inferred, types preserved, nested objects flattened, no upload and no server.
Last updated: Jun 15, 2026
How to convert JSON to JSONL
Turn a top-level JSON array into JSONL (one object per line) with one clickhouse-local command — no upload, schema auto-inferred, and nested fields preserved.
Last updated: Jun 15, 2026
How to convert CSV to NDJSON
Convert a CSV file to NDJSON (newline-delimited JSON) in one command with clickhouse-local — schema auto-detected, types carried into the JSON, no upload and no server.
Last updated: Jun 15, 2026
How to convert CSV to Avro
Convert a CSV file to Avro with one clickhouse-local command. The Avro schema is derived from the column types inferred from the CSV, with no server and no upload.
Last updated: Jun 15, 2026
How to convert Avro to CSV
Convert an Avro file to CSV with one clickhouse-local command. No upload, no server, and the Avro schema is read straight from the file so types carry over.
Last updated: Jun 15, 2026
What is an Avro file?
An Avro file is a row-oriented, binary data format that embeds its own JSON schema in the file header, built for schema evolution. This page explains its structure and proves it by reading the embedded schema with clickhouse local.
Last updated: Jun 15, 2026
How to read an Avro file in Python
Read an Avro file into a DataFrame with chDB and work with it using the pandas API you already know. The embedded schema is read for you, with no struct declarations and no reader loop.
Last updated: Jun 15, 2026
How to read an Avro file
Read and query an Avro file from your terminal with one command, using clickhouse local. The schema is embedded in the file, so there is no structure to declare and no import step.
Last updated: Jun 15, 2026
How to convert Parquet to Avro
Convert a Parquet file to Avro from your terminal with clickhouse-local — the schema is read from the Parquet footer and carried into Avro, with no server and no upload.
Last updated: Jun 15, 2026
How to convert ORC to JSON
Convert an ORC file to JSON from your terminal with one clickhouse-local command — the schema is read from the ORC footer, nested columns become nested JSON, with no upload and no server.
Last updated: Jun 15, 2026
How to convert JSONL to CSV
Convert a JSONL (NDJSON) file to CSV from your terminal with clickhouse-local — schema auto-inferred, nested objects and arrays flattened into columns, no upload and no server.
Last updated: Jun 15, 2026
How to convert CSV to JSONL
Convert a CSV file to JSONL (one JSON object per line) in a single command with clickhouse-local — the schema is auto-inferred and the column types are carried into the JSON, with no upload and no server.
Last updated: Jun 15, 2026
How to convert CSV to Arrow
Convert a CSV file to the Arrow IPC format in one command with clickhouse-local — the schema is inferred from the CSV and the column types are embedded in the Arrow file, with no server and no upload.
Last updated: Jun 15, 2026
How to convert Avro to Parquet
Convert an Avro file to Parquet with one clickhouse-local command. The schema embedded in the Avro file is read automatically and carried into Parquet, with no server and no upload.
Last updated: Jun 15, 2026
How to convert Arrow to CSV
Convert an Arrow (Arrow IPC / Feather) file to CSV with one clickhouse-local command. The schema is read from the file, types carry over, and there's no upload and no server.
Last updated: Jun 15, 2026
What is MessagePack?
MessagePack is a compact binary serialization format that encodes the same data model as JSON in far fewer bytes. This page explains how it is laid out and proves it by reading a real .msgpack file.
Last updated: Jun 15, 2026
What is an .npy file?
An .npy file is NumPy's simple binary format for storing a single numeric array: a small text header describing dtype and shape, followed by the raw bytes. This page explains its structure and proves it by reading one with clickhouse local.
Last updated: Jun 15, 2026
How to read an NPY file in Python
Read a NumPy .npy array into a DataFrame with chDB and work with it using the pandas API you already know, without loading the whole array into memory.
Last updated: Jun 15, 2026
How to read a .npy (NumPy) file with SQL
Query a NumPy .npy array file with SQL from your terminal using clickhouse local. The dtype is read from the file, with no Python and no import step.
Last updated: Jun 15, 2026
How to read a MessagePack file in Python
Read a MessagePack (.msgpack) file into a DataFrame with chDB. MsgPack carries no schema, so you pass the column list once and then use the pandas API you already know, running on ClickHouse's engine.
Last updated: Jun 15, 2026
How to read a MessagePack file
Read a MessagePack (.msgpack) file with SQL from your terminal using clickhouse local. MsgPack carries no schema, so you pass the column structure explicitly in one command.
Last updated: Jun 15, 2026
How to convert NPY to Parquet
Convert a NumPy .npy file to Parquet from your terminal with clickhouse-local — the array dtype is read straight from the .npy header and carried into typed Parquet, with no upload and no Python.
Last updated: Jun 15, 2026
How to convert NPY to CSV
Convert a NumPy .npy array to CSV from your terminal with clickhouse-local — the dtype is read from the .npy header, with no upload and no Python.
Last updated: Jun 15, 2026
How to convert MessagePack to JSON
Convert a MessagePack (.msgpack) file to JSON from your terminal with clickhouse-local — one command, no server and no upload, with the column types carried through into the output.
Last updated: Jun 15, 2026
How to convert MessagePack to CSV
Convert a MessagePack (.msgpack) file to CSV from your terminal with clickhouse-local — one command, no server, no upload, and types carried straight into the CSV.
Last updated: Jun 15, 2026
What is a BSON file?
A BSON file is MongoDB's binary, typed, length-prefixed encoding of JSON documents. This page explains its structure and proves it by reading a real .bson file with clickhouse local.
Last updated: Jun 15, 2026
How to read a Feather file in Python (faster than pandas)
Read a Feather file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 15, 2026
How to read a Feather file from the command line
Read a .feather file and query it with SQL from your terminal using clickhouse local. Feather is the Arrow IPC format, so you read it with FORMAT Arrow. No server, no schema declaration, no import step.
Last updated: Jun 15, 2026
How to read a BSON file in Python
Read a .bson export (mongoexport / mongodump) into a DataFrame with chDB. Work with it using the pandas API you already know, running on ClickHouse's engine, with no MongoDB server and no document-by-document decode loop.
Last updated: Jun 15, 2026
How to read an Arrow file in Python (faster than pandas)
Read an Arrow IPC file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 15, 2026
How to query a BSON file
Query a BSON file (MongoDB's binary JSON, the format mongoexport produces) directly from your terminal with clickhouse local. The schema, including nested sub-documents, is auto-detected. No server, no import step.
Last updated: Jun 15, 2026
How to convert Parquet to Arrow
Convert a Parquet file to the Arrow IPC (Feather) format in one command with clickhouse-local — columnar in, columnar out, types preserved, no upload and no server.
Last updated: Jun 15, 2026
How to convert BSON to JSON
Convert a BSON file to JSON from your terminal with clickhouse-local — the document schema is auto-inferred and nested fields are preserved, with no upload and no server.
Last updated: Jun 15, 2026
Convert BSON to CSV
Convert a BSON file to CSV from your terminal with clickhouse-local — one command, no upload and no server, with nested sub-documents flattened into real columns.
Last updated: Jun 15, 2026
How to convert Arrow to Parquet
Convert an Arrow IPC file to Parquet with one clickhouse-local command — types carry across exactly, nothing is uploaded, and files larger than RAM stream straight through.
Last updated: Jun 15, 2026
How to read an ORC file in Python (faster than pandas)
Read an ORC file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 15, 2026
How to read an ORC file from the command line
Open and query an ORC file from your terminal with one command using clickhouse-local. The schema is read from the file's footer, so there's no server and no CREATE TABLE.
Last updated: Jun 6, 2026
How to parse a log file with regex in SQL
Turn unstructured log lines into typed columns and query them with SQL, using a regex and clickhouse local. One pattern maps the line layout, then ordinary SQL answers the questions.
Last updated: Jun 6, 2026
ORC file format
ORC (Optimized Row Columnar) is an open, columnar, compressed storage format for tabular data. This page explains its stripes, indexes and footer, and proves the structure by reading a real ORC file.
Last updated: Jun 6, 2026
How to convert ORC to Parquet
Convert an ORC file to Parquet with a single command using clickhouse-local — the schema is read from the ORC footer and carried into Parquet, with no server and no upload.
Last updated: Jun 6, 2026
How to convert ORC to CSV
Convert an ORC file to CSV with one clickhouse-local command — the schema is read from the ORC footer, no server and no upload, and it streams files larger than RAM.
Last updated: Jun 6, 2026
How to convert JSON to CSV
Convert JSON to CSV from your terminal with clickhouse-local — one command, schema auto-inferred, with the nested fields flattened into flat CSV columns.
Last updated: Jun 6, 2026
How to convert CSV to ORC
Convert a CSV file to columnar ORC with one clickhouse-local command — no upload and no server, the schema is inferred from the CSV and the types are carried into the ORC file.
Last updated: Jun 8, 2026
How to convert CSV to JSON
Convert a CSV file to JSON from your terminal with clickhouse-local — one object per line or a single JSON array, schema auto-inferred and types preserved, with no upload and no server.
Last updated: Jun 8, 2026
What is a Parquet file?
A Parquet file is an open, column-oriented, compressed storage format for tabular data, built for analytics. This page explains its internal structure and proves it by reading a real file's metadata.
Last updated: Jun 8, 2026
What is NDJSON / JSON Lines?
NDJSON (newline-delimited JSON), also called JSON Lines, stores one JSON object per line. This page explains the format, how it differs from a single JSON array, and proves it by reading a real file with clickhouse local.
Last updated: Jun 8, 2026
What is clickhouse-local? Run SQL on any file or data source, no server
clickhouse-local is a small, fast, standalone build of ClickHouse that runs full ClickHouse SQL over local and remote files (and external databases) from a single binary, with no server to install or start.
Last updated: Jun 8, 2026
What is chDB? In-process ClickHouse for Python
chDB is an in-process SQL OLAP engine for Python: an embedded build of ClickHouse you pip install and call directly, running full ClickHouse SQL over files, DataFrames, and remote sources with no server to start.
Last updated: Jun 8, 2026
What is a TSV file?
A TSV file is a plain-text table whose columns are separated by tab characters, optionally with a header row. This page explains the format, how it differs from CSV, and proves it by reading a real TSV with clickhouse local.
Last updated: Jun 8, 2026
How to run SQL on a JSONL file
Query a .jsonl file (one JSON object per line) with SQL straight from your terminal using clickhouse local. Keys become columns, types are inferred, no server and no import step.
Last updated: Jun 8, 2026
How to query a JSON file with SQL
Query a JSON file with SQL directly from your terminal using clickhouse local. Reads line-delimited JSON or a top-level array, infers nested objects and arrays, with no server and no import step.
Last updated: Jun 8, 2026
How to run SQL on a CSV file
Query a CSV file with SQL directly from your terminal using clickhouse-local — header and column types are auto-detected, with no database and no import step.
Last updated: Jun 8, 2026
How to read a TSV file in Python (faster than pandas)
Read a tab-separated file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 8, 2026
How to read a semicolon-separated file
Query a semicolon-delimited file (European-style CSV) with SQL from your terminal using clickhouse local — set the field delimiter to ";", with no database and no import step.
Last updated: Jun 8, 2026
How to read a pipe-delimited file
Query a pipe-delimited (|) file with SQL from your terminal using clickhouse local. Set the field delimiter to a pipe, let the header and types be detected, and run.
Last updated: Jun 8, 2026
How to read a Parquet file in Python (faster than pandas)
Read a Parquet file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 8, 2026
How to read an NDJSON file in Python (faster than pandas)
Read an NDJSON file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 8, 2026
How to read a JSONL file in Python (faster than pandas)
Read a JSONL file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 8, 2026
How to read a JSON file in Python (faster than pandas)
Read an NDJSON or JSON file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 8, 2026
How to read a file with a custom delimiter
Query a file with any field or row separator using clickhouse local and the CustomSeparated format. Set the delimiter, query with SQL, no import step and no preprocessing.
Last updated: Jun 8, 2026
How to read a CSV file in Python (faster than pandas)
Read a CSV file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.
Last updated: Jun 8, 2026
How to query nested JSON with SQL
Query deeply nested JSON with SQL from your terminal using clickhouse local — dot access into objects, ARRAY JOIN to explode arrays, and the JSON type for irregular keys, with no server and no import step.
Last updated: Jun 8, 2026
How to query an NDJSON file with SQL
Run SQL on an NDJSON file from your terminal with one command, using clickhouse local. The schema is inferred from the data, with no server and no import step.
Last updated: Jun 8, 2026
How to query a JSON Lines file
Query a JSON Lines (JSONL) file with SQL straight from your terminal using clickhouse local. One object per line, schema inferred, no server and no import step.
Last updated: Jun 8, 2026
How to query a compressed file (gzip, zstd) from the command line
Query a gzipped CSV or a zstd-compressed Parquet file directly with clickhouse local. The codec is detected from the file, decompressed on the fly, with no unzip step and no flag.
Last updated: Jun 8, 2026
How to read a TSV file
Read and query a tab-separated (TSV) file with SQL from your terminal using clickhouse local. The header and column types are auto-detected, with no database and no import step.
Last updated: Jun 8, 2026
How to query a Parquet file from the command line
Open and query a Parquet file from your terminal with one command, using clickhouse local. No server, no schema declaration, no import step.
Last updated: Jun 8, 2026
How to flatten nested JSON in Python
Flatten JSON with nested objects and arrays-of-objects into a tidy pandas DataFrame using chDB. Change one import line and the pandas code you already write keeps working on large files.
Last updated: Jun 8, 2026
How to convert TSV to Parquet
Convert a tab-separated file to Parquet from your terminal with clickhouse-local — schema inferred from the TSV, types carried into Parquet, zstd compression by default. No upload, no server.
Last updated: Jun 8, 2026
How to convert TSV to CSV
Convert a tab-separated file to comma-separated from your terminal with clickhouse-local — one command, header preserved, types inferred, no upload and no row limit.
Last updated: Jun 8, 2026
How to convert Parquet to TSV
Convert a Parquet file to tab-separated values with one clickhouse-local command. Types are read from the Parquet footer, nothing is uploaded, and files larger than RAM stream through.
Last updated: Jun 8, 2026
How to convert Parquet to NDJSON
Convert a Parquet file to NDJSON from your terminal with clickhouse-local — one command, no server and no upload, with the Parquet schema read straight from the file footer.
Last updated: Jun 8, 2026
How to convert Parquet to JSONL
Convert a Parquet file to newline-delimited JSON (JSONL) with one clickhouse-local command — schema read from the Parquet footer, types carried into JSON, nested columns preserved. No server, no upload.
Last updated: Jun 8, 2026
How to convert Parquet to JSON
Convert a Parquet file to JSON from your terminal with clickhouse-local — the schema is read from the Parquet footer, and typed and nested columns carry straight into JSON. No server, no upload.
Last updated: Jun 8, 2026
How to convert Parquet to CSV
Convert a Parquet file to CSV from your terminal with clickhouse-local — one command, schema read from the file, types written as text, no upload and no server.
Last updated: Jun 8, 2026
How to convert NDJSON to Parquet
Convert an NDJSON (or JSONL) file to Parquet with one clickhouse-local command — the schema is auto-inferred and nested objects and arrays are carried straight into Parquet, with no upload and no server.
Last updated: Jun 8, 2026
How to convert JSONL to Parquet
Convert a JSONL file to Parquet with one command using clickhouse-local — column types are inferred automatically and nested objects become real Parquet structs, with no server and no upload.
Last updated: Jun 8, 2026
How to convert JSON to Parquet
Convert a JSON or NDJSON file to Parquet with one clickhouse-local command — schema is inferred automatically and types are carried across, including nested objects, with no server and no upload.
Last updated: Jun 8, 2026
How to convert CSV to TSV
Convert a CSV file to tab-separated values in one command with clickhouse-local. The header is kept, column types are carried over, and nothing is uploaded.
Last updated: Jun 8, 2026
How to convert CSV to Parquet
Convert a CSV file to Parquet with one command using clickhouse-local — types are inferred from the CSV, you choose the compression codec, and there is no upload and no server.
Last updated: Jun 8, 2026
How to analyze log files with SQL
Query raw nginx, Apache, or application logs in place with clickhouse-local. No server, no pipeline, no import step: one binary, SQL, and the file on disk.
Last updated: Jun 8, 2026
OPTIMIZE TABLE ... FINAL in ClickHouse: when to use it, when to avoid it, and how merges work
Stop running OPTIMIZE TABLE FINAL in ClickHouse. Learn how parts and background merges work, avoid part explosions, and fix duplicates safely. Read now.
Last updated: Jun 15, 2026
The 5 best AWS RDS alternatives in 2026 (pricing, performance, and migration)
Compare the top AWS RDS alternatives for 2026: Postgres managed by ClickHouse, Aurora, Crunchy, Neon and AlloyDB.
Last updated: Jun 10, 2026
6 Amazon Aurora alternatives for high-volume ingestion (2026 guide)
Explore the best AWS Aurora alternatives for high-volume ingestion in 2026. Compare cost, throughput, and HA tradeoffs, choose the right stack now.
Last updated: Jun 9, 2026
How to choose a database for real-time analytics in 2026
In 2026, the gap between "Data Warehouse" and "Real-Time Serving" has become the primary bottleneck for data teams. For the last decade, we've operated on a split model: transactional databases (OLTP) handled application state, and data warehouses (DWH) handled nightly reporting.
Last updated: Mar 25, 2026
Unifying OLTP and OLAP: HTAP databases, zero-ETL, and best-of-breed architectures
A complete guide to unifying transactional and analytical workloads. Covers HTAP databases, zero-ETL integration, Postgres CDC, columnar databases, and how composable architectures beat converged systems.
Last updated: Mar 29, 2026
What is OLAP?
OLAP is a category of database systems and workloads built for analytical queries across millions to billions of rows. This is what OLAP is, how MOLAP, ROLAP, and HOLAP differ, and why OLAP databases in 2026 run on columnar engines rather than pre-built cubes.
Last updated: May 27, 2026
Real-time analytics platforms: a practical comparison for 2026
A practical 2026 comparison of real-time analytics platforms across ingestion, query latency, freshness, and concurrency — managed and open source.
Last updated: Apr 17, 2026
Columnar storage formats: Parquet, ORC, and Arrow explained
How Apache Parquet, ORC, and Arrow lay out columnar data, and why they coexist instead of converging on one format.
Last updated: May 8, 2026
Row-oriented vs column-oriented databases: a head-to-head comparison
Row stores win point lookups and per-row updates; column stores win wide aggregations and compress 5–10× tighter. A measured comparison with a decision rule.
Last updated: May 8, 2026
What is vectorized query execution?
Vectorized query execution processes batches of values per operator call instead of one row at a time. It is the dominant execution model for analytical databases.
Last updated: May 15, 2026
How columnar storage works
A mechanism-level look at how columnar storage works on disk (per-column blocks, zone maps, and predicate pushdown), with ClickHouse MergeTree and Apache Parquet internals side by side.
Last updated: Jun 1, 2026
Why columnar databases are fast
Columnar databases are 10-1000× faster than row stores on analytical queries because two principles compound: efficient execution (storage layout, vectorisation) and smart pruning (data skipping, late materialisation).
Last updated: May 8, 2026
Best columnar databases in 2026
Nine columnar databases in 2026, ranked using measured benchmark latency and compression rather than marketing copy.
Last updated: May 8, 2026
When should you use a columnar database?
Use a columnar database when queries scan many rows but few columns and the workload is aggregation-heavy. Skip it for point lookups, per-row updates, and 10K+ TPS write workloads.
Last updated: May 8, 2026
How to build real-time customer-facing analytics on Postgres (without slowing down OLTP)
Build real-time customer-facing analytics on Postgres without slowing OLTP. Learn the 2-stage path to sub-100ms dashboards with CDC + OLAP.
Last updated: May 11, 2026
How to architect multi-tenant SaaS on Postgres
Architect multi-tenant SaaS on Postgres with secure isolation, scaling patterns, and compliance-ready design. Get the blueprint and ship safely today.
Last updated: May 25, 2026
What is an OLAP cube?
An OLAP cube is a multidimensional data structure that pre-aggregates measures across hierarchical dimensions. Cubes were the standard analytics implementation from the 1990s through the early 2010s; columnar OLAP engines run the same workloads without the pre-aggregation step.
Last updated: May 27, 2026
What are OLAP operations?
OLAP operations are the canonical analytical actions performed on multidimensional data: drill-down, roll-up, slice, dice, and pivot. Each maps to a specific SQL pattern.
Last updated: May 27, 2026
What is an OLAP database?
OLAP databases are optimised for analytical queries over millions to billions of rows. Here is how they work, the five categories on the market in 2026, and how to choose between them.
Last updated: May 27, 2026
Is Postgres an OLAP database?
Postgres is an OLTP database, not OLAP. Its row-oriented storage and B-tree indexes are built for point lookups and small writes, not the wide aggregations OLAP workloads run. Here is how each Postgres-for-analytics extension actually performs, and when CDC to a columnar engine is the right call.
Last updated: May 27, 2026
The fastest OLAP databases in 2026 (ranked by ClickBench)
The fastest OLAP engines deliver sub-second median query latency on a 100 GB analytical dataset. Here are the OLAP databases ranked by their results on ClickBench, the public benchmark for analytical query latency.
Last updated: May 27, 2026
OLTP vs OLAP
OLTP handles transactional reads and writes in milliseconds; OLAP scans billions of rows for analytics in sub-second time. Architectures, decision rules, and where the line blurs.
Last updated: May 27, 2026
What is a columnar database?
A columnar database stores each column on disk separately to cut I/O and compress aggressively. Here is what that means in practice and when to use one.
Last updated: May 25, 2026
Database compression: encodings, codecs and ratios
Learn how databases compress data, the trade-offs between row and column storage, and why ClickHouse delivers industry-leading compression ratios.
Last updated: May 6, 2026
7 things to consider when choosing a cloud data warehouse
This guide examines seven critical factors that actually matter when evaluating data warehouses for production use. We'll compare how ClickHouse, Snowflake, BigQuery, and Redshift handle each consideration, drawing from documented capabilities, architectural designs, and real-world customer experiences.
Last updated: Nov 18, 2025
Open table formats
In this guide, we'll explore the Iceberg, Delta Lake, and Hudi open table formats.
Last updated: May 9, 2025
Data catalog
In this guide, we'll explore data catalogs for open table formats like Iceberg, Delta Lake, and Hudi, explaining how these metadata systems make modern data lakes more powerful and accessible.
Last updated: May 9, 2025
Apache Iceberg
Apache Iceberg transforms data lakes into robust lakehouse architectures with its high-performance table format. This article explores Iceberg's origins, key features like ACID transactions and schema evolution, and demonstrates how to query Iceberg tables in ClickHouse using both direct and catalog-based approaches.
Last updated: May 21, 2025
Top 5 cloud data warehouses in 2026: Architecture, cost, and open-source
Compare the top cloud data warehouses of 2025: Snowflake, BigQuery, Redshift, Databricks, and ClickHouse. Analyze cost efficiency, real-time performance, and open-source architecture.
Last updated: Apr 13, 2026
What is a data lakehouse?
The data lakehouse combines the best of data warehouses and data lakes into a unified architecture. We'll explore its key components, advantages, and how ClickHouse fits into this modern analytics platform.
Last updated: Apr 17, 2026
What is Real-Time Analytics? A Complete Guide (2026)
Discover what real-time analytics is, its key benefits, and its use cases, such as fraud detection. Learn how it differs from batch and what to look for in a database.
Last updated: Apr 14, 2026
Best real-time analytics database for star schema and fast joins (2026 guide)
Compare real-time analytics databases for star schema and fast cross-table joins. Learn what to test in 2026 and choose the right engine for your workload.
Last updated: Apr 8, 2026
Managed Postgres for AI and Real-Time Apps
Learn how managed Postgres on NVMe with native ClickHouse CDC and pg_clickhouse eliminates the ETL tax and scales from OLTP to real-time analytics and AI apps
Last updated: Apr 7, 2026
The ultimate guide to Open Source Observability in 2026: From silos to stacks
Explore the top open source observability stacks for 2025. Compare ELK, LGTM, and unified observability solutions like ClickStack for cost, scale, and high-cardinality data.
Last updated: Nov 10, 2025
A practical guide to observability TCO and cost reduction
Tired of exploding observability bills? Learn how to calculate your true TCO and see how to cut architecture costs by 70-90% vs. ingest-based pricing.
Last updated: Nov 26, 2025
How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook
A technical guide for engineers to build a cost-efficient observability stack. Learn how to use ZSTD codecs, Materialized Views, and Tiered Storage with ClickHouse to reduce observability data footprint by 10x.
Last updated: Jan 7, 2026
Mastering Kubernetes observability: a guide to monitoring modern architectures
Learn why traditional Kubernetes monitoring fails at scale and how to master observability with a modern cost-effective database-first approach.
Aditya Somani • Last updated: Jan 21, 2026
What is observability in 2026? Why it's an analytics problem and why your database matters.
Observability in 2026 is an analytics problem. Learn why the traditional "three pillars" model fails and how a unified ClickHouse database solves cost and latency issues.
Aditya Somani • Last updated: Jan 7, 2026
8 top New Relic alternatives to break the pricing trap (2026 guide)
Escape New Relic's pricing trap and vendor lock-in. Our 2026 guide compares the top 8 observability alternatives—including Datadog, Grafana, and ClickHouse—for pricing, data ownership, and SQL support.
Aditya Somani • Last updated: Dec 15, 2025
The 8 best Datadog alternatives for observability at scale in 2026
Exploring Datadog alternatives? We compare the top competitors for cost-efficiency, performance at scale, and handling high-cardinality data. Find the right fit.
Aditya Somani • Last updated: Dec 8, 2025
Beyond sampling: managing petabyte-scale logs without losing data
Stop compromising on visibility. Learn how to manage petabyte-scale logs without sampling using ClickHouse’s cost-efficient, columnar architecture.
Aditya Somani • Last updated: Nov 27, 2025
The high-cardinality trap: why your observability platform is failing (and what to do about it)
Tired of slow queries & dropped fields? Learn why traditional observability tools fail with high-cardinality data & how a columnar database like ClickHouse solves it.
Aditya Somani • Last updated: Nov 21, 2025
Top 15 infrastructure monitoring tools in 2026: a performance and cost-based comparison
A complete guide to the best infrastructure monitoring tools. We compare 15 top solutions on features, performance, high-cardinality handling, and cost at scale.
Aditya Somani • Last updated: Nov 17, 2025
Best practices for storing OpenTelemetry Collector data
What are the best practices for storing your OTel Collector data?
Last updated: Nov 10, 2025
The definitive guide to ClickHouse query optimization (2026)
Master ClickHouse query optimization through architectural understanding. Learn why ORDER BY design can improve performance 100×, plus proven techniques for trillion-row millisecond queries.
Al Brown, Tom Schreiber, Lionel Palacin • Last updated: Jan 26, 2026
Top 10 OpenTelemetry Compatible Platforms for 2025
Explore the top 10 OpenTelemetry compatible platforms for 2025. Learn how to choose the right backend based on performance, cost at scale, and analytics.
Aditya Somani • Last updated: Nov 6, 2025
MCP and Data Warehouses: everything you need to know
This article explores the suitability of MCP with Data Warehouses, and discusses the business and technical details you need to know to succeed.
Al Brown • Last updated: Nov 29, 2025
What Is a Time-Series Database? Examples, Use Cases & ClickHouse Guide
A time-series database (TSDB) stores and queries data indexed by time, including metrics, events, and logs. Learn real-world examples, common architectures, and when ClickHouse fits time-series workloads.
Last updated: Nov 18, 2025
Real-time data visualization
This guide is all about real-time data visualization. We'll explore how it differs from normal visualization, see some examples, and learn about the tools we can use.
Last updated: Apr 11, 2025
What is a JSON database?
In this guide, we'll learn about JSON, the types of databases that can store JSON, and how to work with JSON data in ClickHouse.
Last updated: Apr 11, 2025
What is a data application?
In this guide, we'll learn all about data applications - what are they, what are the main components, and why would you want to create one?
Last updated: Apr 11, 2025
Avro vs Parquet
In this guide, we'll learn all about the Apache Avro and Apache Parquet big data formats.
Last updated: Apr 11, 2025
Structured, unstructured, and semi-structured data
In this guide, we explore the three main forms of data: structured data with rigid schemas like database tables, unstructured data like text and images with no predefined format, and semi-structured data like JSON that combines elements of both while maintaining flexibility.
Last updated: Apr 11, 2025
Build a dashboard in Python with ClickHouse and Streamlit
In this guide, you'll learn how to build a Python dashboard using ClickHouse and Streamlit. We'll create a real-world example that visualizes Bluesky social media data, walking through everything from basic setup to interactive visualizations. Perfect for data scientists and analysts who want to share their insights through custom dashboards.
Last updated: Jun 2, 2025
Log monitoring
Discover the fundamentals of log monitoring systems, exploring different log types, monitoring techniques, and modern tools, with practical insights into how organizations leverage solutions like ClickHouse to manage massive log volumes efficiently and cost-effectively.
Last updated: Apr 11, 2025
An intro to OpenTelemetry (OTel)
In this guide, we’ll explore OpenTelemetry (OTel), a framework for collecting and standardizing telemetry data—metrics, logs, and traces—enhancing observability and performance monitoring in modern software systems.
Last updated: Apr 11, 2025
Telemetry data explained
In this guide, we'll explore telemetry data - the vital information that helps us understand, monitor, and improve our software systems through the collection of metrics, logs, and traces.
Last updated: Apr 11, 2025
Observability
In this guide, we'll explore observability - the practice of understanding a system's internal state through its outputs, and how modern approaches are helping organizations gain deeper insights into their systems' behavior.
Last updated: Jun 25, 2025
Security Information and Event Management (SIEM)
In this guide, we'll explore SIEM (Security Information and Event Management) - the central security system that collects, analyzes, and responds to security threats across your organization's entire infrastructure.
Last updated: Apr 11, 2025
Understanding LLM Observability
In this guide, we'll explore how teams monitor and debug their LLM applications, helping them understand everything from response accuracy and token usage to the complex reasoning chains of AI agents.
Last updated: Oct 28, 2025
Application Performance Monitoring (APM)
In this guide, we'll explore Application Performance Monitoring (APM) - the practice of tracking and analyzing application behavior in real-time to ensure optimal performance and user experience.
Last updated: Apr 11, 2025
Network monitoring
In this guide, we'll explore how organizations implement network monitoring to gain visibility, troubleshoot issues, and ensure optimal performance across distributed infrastructures.
Last updated: Apr 11, 2025
Structured logging
In this guide, we'll explore how structured logging transforms traditional text-based logs into queryable data, enabling organizations to build powerful monitoring, analysis, and automation capabilities at scale.
Last updated: Apr 11, 2025
Top 5 Splunk Alternatives in 2025
What are the best alternatives to Splunk?
Last updated: Oct 29, 2025
Instrumenting OpenAI with OpenTelemetry (OTel)
In this guide, we’ll learn how to instrument the OpenAI client with OpenTelemetry (OTel) so that we can generate and collect observability data about our LLM calls.
Last updated: Aug 1, 2025
Setting up Apache Iceberg locally using PySpark
In this guide, we'll learn how to set up Apache Iceberg locally using PySpark.
Mark Needham • Last updated: Aug 5, 2025
Tracing LangChain apps with OpenLLMetry
In this guide, we'll learn about OpenLLMetry - the open-source observability framework for large language models.
Mark Needham • Last updated: Aug 29, 2025