ClickHouse engineering resources


  • How to scale vector search in Postgres (pgvector) for RAG and AI agents: memory limits, filtering, and when to go hybrid

    Scale vector search in Postgres with pgvector: avoid HNSW RAM limits, fix filtering recall, and know when to go hybrid. Read now.

    Last updated: Jun 18, 2026

  • How to query a REST API in Python

    Read a JSON API response into a DataFrame with chDB. Use the pandas API you already know to filter and aggregate the response, running on ClickHouse's engine with no server to start.

    Last updated: Jun 15, 2026

  • How to convert Parquet to ORC

    Convert a Parquet file to ORC with one clickhouse-local command. The schema is read from the Parquet footer and the types carry into ORC, with no server and no upload.

    Last updated: Jun 6, 2026

  • How to query a REST API with SQL

    Run SQL straight against a live JSON or CSV HTTP endpoint with clickhouse local — fetch, filter, aggregate and join the response in one statement, with no download step.

    Last updated: Jun 15, 2026

  • Parse url-encoded form data with SQL

    Turn an application/x-www-form-urlencoded body like a=1&b=hello into a queryable row with clickhouse local, with no parsing code and no import step.

    Last updated: Jun 15, 2026

  • What is Protobuf?

    Protobuf (Protocol Buffers) is a schema-first, compact binary serialization format. This page explains its structure, the .proto schema requirement, and proves it by writing and reading a real Protobuf file.

    Last updated: Jun 15, 2026

  • Run SQL across multiple CSV or Parquet files

    Query a whole directory of CSV or Parquet files as one table with clickhouse-local. A glob path reads them all, aggregates run across every file, and a virtual column tells you which file each row came from.

    Last updated: Jun 15, 2026

  • Read Avro from a schema registry with clickhouse-local

    Read a plain .avro file with one command, and use the AvroConfluent format plus format_avro_schema_registry_url to read Confluent-framed Avro messages from a Kafka Schema Registry.

    Last updated: Jun 15, 2026

  • How to query a mysqldump file without importing

    Run SQL against the tables inside a mysqldump .sql file directly with clickhouse-local. No MySQL server, no restore step — the INSERT statements are read as a table.

    Last updated: Jun 15, 2026

  • How to convert Avro to JSON

    Convert an Avro file to JSON with one command using clickhouse-local. The schema is read from the Avro file itself, types are carried into JSON, and there is no upload and no server.

    Last updated: Jun 15, 2026

  • How to read a RowBinary file

    Read a ClickHouse RowBinary file from your terminal with one command, using clickhouse local. RowBinaryWithNamesAndTypes is self-describing; plain RowBinary needs an explicit structure.

    Last updated: Jun 15, 2026

  • How to read a ClickHouse Native format file

    Open and query a ClickHouse Native (.native) file from your terminal with one command, using clickhouse local. The file carries its own schema, so reads need no types and lose no precision.

    Last updated: Jun 15, 2026

  • How to convert TSV to JSON

    Convert a TSV file to JSON from your terminal with clickhouse-local — the header becomes the keys and types are inferred, with no upload and no server.

    Last updated: Jun 15, 2026

  • How to convert NDJSON to CSV

    Convert a line-delimited NDJSON file to CSV from your terminal with clickhouse-local — schema auto-inferred, nested objects flattened, no server and no upload.

    Last updated: Jun 15, 2026

  • How to convert JSONL to JSON

    Turn a line-delimited JSONL file into a single JSON array from your terminal with clickhouse-local. Schema auto-inferred, types preserved, no server and no upload.

    Last updated: Jun 15, 2026

  • How to convert JSON to TSV

    Convert a JSON file to a tab-separated file from your terminal with clickhouse-local — schema auto-inferred, types preserved, nested objects flattened, no upload and no server.

    Last updated: Jun 15, 2026

  • How to convert JSON to JSONL

    Turn a top-level JSON array into JSONL (one object per line) with one clickhouse-local command — no upload, schema auto-inferred, and nested fields preserved.

    Last updated: Jun 15, 2026

  • How to convert CSV to NDJSON

    Convert a CSV file to NDJSON (newline-delimited JSON) in one command with clickhouse-local — schema auto-detected, types carried into the JSON, no upload and no server.

    Last updated: Jun 15, 2026

  • How to convert CSV to Avro

    Convert a CSV file to Avro with one clickhouse-local command. The Avro schema is derived from the column types inferred from the CSV, with no server and no upload.

    Last updated: Jun 15, 2026

  • How to convert Avro to CSV

    Convert an Avro file to CSV with one clickhouse-local command. No upload, no server, and the Avro schema is read straight from the file so types carry over.

    Last updated: Jun 15, 2026

  • What is an Avro file?

    An Avro file is a row-oriented, binary data format that embeds its own JSON schema in the file header, built for schema evolution. This page explains its structure and proves it by reading the embedded schema with clickhouse local.

    Last updated: Jun 15, 2026

  • How to read an Avro file in Python

    Read an Avro file into a DataFrame with chDB and work with it using the pandas API you already know. The embedded schema is read for you, with no struct declarations and no reader loop.

    Last updated: Jun 15, 2026

  • How to read an Avro file

    Read and query an Avro file from your terminal with one command, using clickhouse local. The schema is embedded in the file, so there is no structure to declare and no import step.

    Last updated: Jun 15, 2026

  • How to convert Parquet to Avro

    Convert a Parquet file to Avro from your terminal with clickhouse-local — the schema is read from the Parquet footer and carried into Avro, with no server and no upload.

    Last updated: Jun 15, 2026

  • How to convert ORC to JSON

    Convert an ORC file to JSON from your terminal with one clickhouse-local command — the schema is read from the ORC footer, nested columns become nested JSON, with no upload and no server.

    Last updated: Jun 15, 2026

  • How to convert JSONL to CSV

    Convert a JSONL (NDJSON) file to CSV from your terminal with clickhouse-local — schema auto-inferred, nested objects and arrays flattened into columns, no upload and no server.

    Last updated: Jun 15, 2026

  • How to convert CSV to JSONL

    Convert a CSV file to JSONL (one JSON object per line) in a single command with clickhouse-local — the schema is auto-inferred and the column types are carried into the JSON, with no upload and no server.

    Last updated: Jun 15, 2026

  • How to convert CSV to Arrow

    Convert a CSV file to the Arrow IPC format in one command with clickhouse-local — the schema is inferred from the CSV and the column types are embedded in the Arrow file, with no server and no upload.

    Last updated: Jun 15, 2026

  • How to convert Avro to Parquet

    Convert an Avro file to Parquet with one clickhouse-local command. The schema embedded in the Avro file is read automatically and carried into Parquet, with no server and no upload.

    Last updated: Jun 15, 2026

  • How to convert Arrow to CSV

    Convert an Arrow (Arrow IPC / Feather) file to CSV with one clickhouse-local command. The schema is read from the file, types carry over, and there's no upload and no server.

    Last updated: Jun 15, 2026

  • What is MessagePack?

    MessagePack is a compact binary serialization format that encodes the same data model as JSON in far fewer bytes. This page explains how it is laid out and proves it by reading a real .msgpack file.

    Last updated: Jun 15, 2026

  • What is an .npy file?

    An .npy file is NumPy's simple binary format for storing a single numeric array: a small text header describing dtype and shape, followed by the raw bytes. This page explains its structure and proves it by reading one with clickhouse local.

    Last updated: Jun 15, 2026

  • How to read an NPY file in Python

    Read a NumPy .npy array into a DataFrame with chDB and work with it using the pandas API you already know, without loading the whole array into memory.

    Last updated: Jun 15, 2026

  • How to read a .npy (NumPy) file with SQL

    Query a NumPy .npy array file with SQL from your terminal using clickhouse local. The dtype is read from the file, with no Python and no import step.

    Last updated: Jun 15, 2026

  • How to read a MessagePack file in Python

    Read a MessagePack (.msgpack) file into a DataFrame with chDB. MsgPack carries no schema, so you pass the column list once and then use the pandas API you already know, running on ClickHouse's engine.

    Last updated: Jun 15, 2026

  • How to read a MessagePack file

    Read a MessagePack (.msgpack) file with SQL from your terminal using clickhouse local. MsgPack carries no schema, so you pass the column structure explicitly in one command.

    Last updated: Jun 15, 2026

  • How to convert NPY to Parquet

    Convert a NumPy .npy file to Parquet from your terminal with clickhouse-local — the array dtype is read straight from the .npy header and carried into typed Parquet, with no upload and no Python.

    Last updated: Jun 15, 2026

  • How to convert NPY to CSV

    Convert a NumPy .npy array to CSV from your terminal with clickhouse-local — the dtype is read from the .npy header, with no upload and no Python.

    Last updated: Jun 15, 2026

  • How to convert MessagePack to JSON

    Convert a MessagePack (.msgpack) file to JSON from your terminal with clickhouse-local — one command, no server and no upload, with the column types carried through into the output.

    Last updated: Jun 15, 2026

  • How to convert MessagePack to CSV

    Convert a MessagePack (.msgpack) file to CSV from your terminal with clickhouse-local — one command, no server, no upload, and types carried straight into the CSV.

    Last updated: Jun 15, 2026

  • What is a BSON file?

    A BSON file is MongoDB's binary, typed, length-prefixed encoding of JSON documents. This page explains its structure and proves it by reading a real .bson file with clickhouse local.

    Last updated: Jun 15, 2026

  • How to read a Feather file in Python (faster than pandas)

    Read a Feather file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 15, 2026

  • How to read a Feather file from the command line

    Read a .feather file and query it with SQL from your terminal using clickhouse local. Feather is the Arrow IPC format, so you read it with FORMAT Arrow. No server, no schema declaration, no import step.

    Last updated: Jun 15, 2026

  • How to read a BSON file in Python

    Read a .bson export (mongoexport / mongodump) into a DataFrame with chDB. Work with it using the pandas API you already know, running on ClickHouse's engine, with no MongoDB server and no document-by-document decode loop.

    Last updated: Jun 15, 2026

  • How to read an Arrow file in Python (faster than pandas)

    Read an Arrow IPC file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 15, 2026

  • How to query a BSON file

    Query a BSON file (MongoDB's binary JSON, the format mongoexport produces) directly from your terminal with clickhouse local. The schema, including nested sub-documents, is auto-detected. No server, no import step.

    Last updated: Jun 15, 2026

  • How to convert Parquet to Arrow

    Convert a Parquet file to the Arrow IPC (Feather) format in one command with clickhouse-local — columnar in, columnar out, types preserved, no upload and no server.

    Last updated: Jun 15, 2026

  • How to convert BSON to JSON

    Convert a BSON file to JSON from your terminal with clickhouse-local — the document schema is auto-inferred and nested fields are preserved, with no upload and no server.

    Last updated: Jun 15, 2026

  • Convert BSON to CSV

    Convert a BSON file to CSV from your terminal with clickhouse-local — one command, no upload and no server, with nested sub-documents flattened into real columns.

    Last updated: Jun 15, 2026

  • How to convert Arrow to Parquet

    Convert an Arrow IPC file to Parquet with one clickhouse-local command — types carry across exactly, nothing is uploaded, and files larger than RAM stream straight through.

    Last updated: Jun 15, 2026

  • How to read an ORC file in Python (faster than pandas)

    Read an ORC file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 15, 2026

  • How to read an ORC file from the command line

    Open and query an ORC file from your terminal with one command using clickhouse-local. The schema is read from the file's footer, so there's no server and no CREATE TABLE.

    Last updated: Jun 6, 2026

  • How to parse a log file with regex in SQL

    Turn unstructured log lines into typed columns and query them with SQL, using a regex and clickhouse local. One pattern maps the line layout, then ordinary SQL answers the questions.

    Last updated: Jun 6, 2026

  • ORC file format

    ORC (Optimized Row Columnar) is an open, columnar, compressed storage format for tabular data. This page explains its stripes, indexes and footer, and proves the structure by reading a real ORC file.

    Last updated: Jun 6, 2026

  • How to convert ORC to Parquet

    Convert an ORC file to Parquet with a single command using clickhouse-local — the schema is read from the ORC footer and carried into Parquet, with no server and no upload.

    Last updated: Jun 6, 2026

  • How to convert ORC to CSV

    Convert an ORC file to CSV with one clickhouse-local command — the schema is read from the ORC footer, no server and no upload, and it streams files larger than RAM.

    Last updated: Jun 6, 2026

  • How to convert JSON to CSV

    Convert JSON to CSV from your terminal with clickhouse-local — one command, schema auto-inferred, with the nested fields flattened into flat CSV columns.

    Last updated: Jun 6, 2026

  • How to convert CSV to ORC

    Convert a CSV file to columnar ORC with one clickhouse-local command — no upload and no server, the schema is inferred from the CSV and the types are carried into the ORC file.

    Last updated: Jun 8, 2026

  • How to convert CSV to JSON

    Convert a CSV file to JSON from your terminal with clickhouse-local — one object per line or a single JSON array, schema auto-inferred and types preserved, with no upload and no server.

    Last updated: Jun 8, 2026

  • What is a Parquet file?

    A Parquet file is an open, column-oriented, compressed storage format for tabular data, built for analytics. This page explains its internal structure and proves it by reading a real file's metadata.

    Last updated: Jun 8, 2026

  • What is NDJSON / JSON Lines?

    NDJSON (newline-delimited JSON), also called JSON Lines, stores one JSON object per line. This page explains the format, how it differs from a single JSON array, and proves it by reading a real file with clickhouse local.

    Last updated: Jun 8, 2026

  • What is clickhouse-local? Run SQL on any file or data source, no server

    clickhouse-local is a small, fast, standalone build of ClickHouse that runs full ClickHouse SQL over local and remote files (and external databases) from a single binary, with no server to install or start.

    Last updated: Jun 8, 2026

  • What is chDB? In-process ClickHouse for Python

    chDB is an in-process SQL OLAP engine for Python: an embedded build of ClickHouse you pip install and call directly, running full ClickHouse SQL over files, DataFrames, and remote sources with no server to start.

    Last updated: Jun 8, 2026

  • What is a TSV file?

    A TSV file is a plain-text table whose columns are separated by tab characters, optionally with a header row. This page explains the format, how it differs from CSV, and proves it by reading a real TSV with clickhouse local.

    Last updated: Jun 8, 2026

  • How to run SQL on a JSONL file

    Query a .jsonl file (one JSON object per line) with SQL straight from your terminal using clickhouse local. Keys become columns, types are inferred, no server and no import step.

    Last updated: Jun 8, 2026

  • How to query a JSON file with SQL

    Query a JSON file with SQL directly from your terminal using clickhouse local. Reads line-delimited JSON or a top-level array, infers nested objects and arrays, with no server and no import step.

    Last updated: Jun 8, 2026

  • How to run SQL on a CSV file

    Query a CSV file with SQL directly from your terminal using clickhouse-local — header and column types are auto-detected, with no database and no import step.

    Last updated: Jun 8, 2026

  • How to read a TSV file in Python (faster than pandas)

    Read a tab-separated file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 8, 2026

  • How to read a semicolon-separated file

    Query a semicolon-delimited file (European-style CSV) with SQL from your terminal using clickhouse local — set the field delimiter to ";", with no database and no import step.

    Last updated: Jun 8, 2026

  • How to read a pipe-delimited file

    Query a pipe-delimited (|) file with SQL from your terminal using clickhouse local. Set the field delimiter to a pipe, let the header and types be detected, and run.

    Last updated: Jun 8, 2026

  • How to read a Parquet file in Python (faster than pandas)

    Read a Parquet file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 8, 2026

  • How to read an NDJSON file in Python (faster than pandas)

    Read an NDJSON file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 8, 2026

  • How to read a JSONL file in Python (faster than pandas)

    Read a JSONL file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 8, 2026

  • How to read a JSON file in Python (faster than pandas)

    Read an NDJSON or JSON file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 8, 2026

  • How to read a file with a custom delimiter

    Query a file with any field or row separator using clickhouse local and the CustomSeparated format. Set the delimiter, query with SQL, no import step and no preprocessing.

    Last updated: Jun 8, 2026

  • How to read a CSV file in Python (faster than pandas)

    Read a CSV file into a DataFrame with chDB, a drop-in replacement for pandas. Change one import line and your existing pandas code runs on ClickHouse's engine, so it stays fast as files grow.

    Last updated: Jun 8, 2026

  • How to query nested JSON with SQL

    Query deeply nested JSON with SQL from your terminal using clickhouse local — dot access into objects, ARRAY JOIN to explode arrays, and the JSON type for irregular keys, with no server and no import step.

    Last updated: Jun 8, 2026

  • How to query an NDJSON file with SQL

    Run SQL on an NDJSON file from your terminal with one command, using clickhouse local. The schema is inferred from the data, with no server and no import step.

    Last updated: Jun 8, 2026

  • How to query a JSON Lines file

    Query a JSON Lines (JSONL) file with SQL straight from your terminal using clickhouse local. One object per line, schema inferred, no server and no import step.

    Last updated: Jun 8, 2026

  • How to query a compressed file (gzip, zstd) from the command line

    Query a gzipped CSV or a zstd-compressed Parquet file directly with clickhouse local. The codec is detected from the file, decompressed on the fly, with no unzip step and no flag.

    Last updated: Jun 8, 2026

  • How to read a TSV file

    Read and query a tab-separated (TSV) file with SQL from your terminal using clickhouse local. The header and column types are auto-detected, with no database and no import step.

    Last updated: Jun 8, 2026

  • How to query a Parquet file from the command line

    Open and query a Parquet file from your terminal with one command, using clickhouse local. No server, no schema declaration, no import step.

    Last updated: Jun 8, 2026

  • How to flatten nested JSON in Python

    Flatten JSON with nested objects and arrays-of-objects into a tidy pandas DataFrame using chDB. Change one import line and the pandas code you already write keeps working on large files.

    Last updated: Jun 8, 2026

  • How to convert TSV to Parquet

    Convert a tab-separated file to Parquet from your terminal with clickhouse-local — schema inferred from the TSV, types carried into Parquet, zstd compression by default. No upload, no server.

    Last updated: Jun 8, 2026

  • How to convert TSV to CSV

    Convert a tab-separated file to comma-separated from your terminal with clickhouse-local — one command, header preserved, types inferred, no upload and no row limit.

    Last updated: Jun 8, 2026

  • How to convert Parquet to TSV

    Convert a Parquet file to tab-separated values with one clickhouse-local command. Types are read from the Parquet footer, nothing is uploaded, and files larger than RAM stream through.

    Last updated: Jun 8, 2026

  • How to convert Parquet to NDJSON

    Convert a Parquet file to NDJSON from your terminal with clickhouse-local — one command, no server and no upload, with the Parquet schema read straight from the file footer.

    Last updated: Jun 8, 2026

  • How to convert Parquet to JSONL

    Convert a Parquet file to newline-delimited JSON (JSONL) with one clickhouse-local command — schema read from the Parquet footer, types carried into JSON, nested columns preserved. No server, no upload.

    Last updated: Jun 8, 2026

  • How to convert Parquet to JSON

    Convert a Parquet file to JSON from your terminal with clickhouse-local — the schema is read from the Parquet footer, and typed and nested columns carry straight into JSON. No server, no upload.

    Last updated: Jun 8, 2026

  • How to convert Parquet to CSV

    Convert a Parquet file to CSV from your terminal with clickhouse-local — one command, schema read from the file, types written as text, no upload and no server.

    Last updated: Jun 8, 2026

  • How to convert NDJSON to Parquet

    Convert an NDJSON (or JSONL) file to Parquet with one clickhouse-local command — the schema is auto-inferred and nested objects and arrays are carried straight into Parquet, with no upload and no server.

    Last updated: Jun 8, 2026

  • How to convert JSONL to Parquet

    Convert a JSONL file to Parquet with one command using clickhouse-local — column types are inferred automatically and nested objects become real Parquet structs, with no server and no upload.

    Last updated: Jun 8, 2026

  • How to convert JSON to Parquet

    Convert a JSON or NDJSON file to Parquet with one clickhouse-local command — schema is inferred automatically and types are carried across, including nested objects, with no server and no upload.

    Last updated: Jun 8, 2026

  • How to convert CSV to TSV

    Convert a CSV file to tab-separated values in one command with clickhouse-local. The header is kept, column types are carried over, and nothing is uploaded.

    Last updated: Jun 8, 2026

  • How to convert CSV to Parquet

    Convert a CSV file to Parquet with one command using clickhouse-local — types are inferred from the CSV, you choose the compression codec, and there is no upload and no server.

    Last updated: Jun 8, 2026

  • How to analyze log files with SQL

    Query raw nginx, Apache, or application logs in place with clickhouse-local. No server, no pipeline, no import step: one binary, SQL, and the file on disk.

    Last updated: Jun 8, 2026

  • OPTIMIZE TABLE ... FINAL in ClickHouse: when to use it, when to avoid it, and how merges work

    Stop running OPTIMIZE TABLE FINAL in ClickHouse. Learn how parts and background merges work, avoid part explosions, and fix duplicates safely. Read now.

    Last updated: Jun 15, 2026

  • The 5 best AWS RDS alternatives in 2026 (pricing, performance, and migration)

    Compare the top AWS RDS alternatives for 2026: Postgres managed by ClickHouse, Aurora, Crunchy, Neon and AlloyDB.

    Last updated: Jun 10, 2026

  • 6 Amazon Aurora alternatives for high-volume ingestion (2026 guide)

    Explore the best AWS Aurora alternatives for high-volume ingestion in 2026. Compare cost, throughput, and HA tradeoffs, choose the right stack now.

    Last updated: Jun 9, 2026

  • How to choose a database for real-time analytics in 2026

    In 2026, the gap between "Data Warehouse" and "Real-Time Serving" has become the primary bottleneck for data teams. For the last decade, we've operated on a split model: transactional databases (OLTP) handled application state, and data warehouses (DWH) handled nightly reporting.

    Last updated: Mar 25, 2026

  • Unifying OLTP and OLAP: HTAP databases, zero-ETL, and best-of-breed architectures

    A complete guide to unifying transactional and analytical workloads. Covers HTAP databases, zero-ETL integration, Postgres CDC, columnar databases, and how composable architectures beat converged systems.

    Last updated: Mar 29, 2026

  • What is OLAP?

    OLAP is a category of database systems and workloads built for analytical queries across millions to billions of rows. This is what OLAP is, how MOLAP, ROLAP, and HOLAP differ, and why OLAP databases in 2026 run on columnar engines rather than pre-built cubes.

    Last updated: May 27, 2026

  • Real-time analytics platforms: a practical comparison for 2026

    A practical 2026 comparison of real-time analytics platforms across ingestion, query latency, freshness, and concurrency — managed and open source.

    Last updated: Apr 17, 2026

  • Columnar storage formats: Parquet, ORC, and Arrow explained

    How Apache Parquet, ORC, and Arrow lay out columnar data, and why they coexist instead of converging on one format.

    Last updated: May 8, 2026

  • Row-oriented vs column-oriented databases: a head-to-head comparison

    Row stores win point lookups and per-row updates; column stores win wide aggregations and compress 5–10× tighter. A measured comparison with a decision rule.

    Last updated: May 8, 2026

  • What is vectorized query execution?

    Vectorized query execution processes batches of values per operator call instead of one row at a time. It is the dominant execution model for analytical databases.

    Last updated: May 15, 2026

  • How columnar storage works

    A mechanism-level look at how columnar storage works on disk (per-column blocks, zone maps, and predicate pushdown), with ClickHouse MergeTree and Apache Parquet internals side by side.

    Last updated: Jun 1, 2026

  • Why columnar databases are fast

    Columnar databases are 10-1000× faster than row stores on analytical queries because two principles compound: efficient execution (storage layout, vectorisation) and smart pruning (data skipping, late materialisation).

    Last updated: May 8, 2026

  • Best columnar databases in 2026

    Nine columnar databases in 2026, ranked using measured benchmark latency and compression rather than marketing copy.

    Last updated: May 8, 2026

  • When should you use a columnar database?

    Use a columnar database when queries scan many rows but few columns and the workload is aggregation-heavy. Skip it for point lookups, per-row updates, and 10K+ TPS write workloads.

    Last updated: May 8, 2026

  • How to build real-time customer-facing analytics on Postgres (without slowing down OLTP)

    Build real-time customer-facing analytics on Postgres without slowing OLTP. Learn the 2-stage path to sub-100ms dashboards with CDC + OLAP.

    Last updated: May 11, 2026

  • How to architect multi-tenant SaaS on Postgres

    Architect multi-tenant SaaS on Postgres with secure isolation, scaling patterns, and compliance-ready design. Get the blueprint and ship safely today.

    Last updated: May 25, 2026

  • What is an OLAP cube?

    An OLAP cube is a multidimensional data structure that pre-aggregates measures across hierarchical dimensions. Cubes were the standard analytics implementation from the 1990s through the early 2010s; columnar OLAP engines run the same workloads without the pre-aggregation step.

    Last updated: May 27, 2026

  • What are OLAP operations?

    OLAP operations are the canonical analytical actions performed on multidimensional data: drill-down, roll-up, slice, dice, and pivot. Each maps to a specific SQL pattern.

    Last updated: May 27, 2026

  • What is an OLAP database?

    OLAP databases are optimised for analytical queries over millions to billions of rows. Here is how they work, the five categories on the market in 2026, and how to choose between them.

    Last updated: May 27, 2026

  • Is Postgres an OLAP database?

    Postgres is an OLTP database, not OLAP. Its row-oriented storage and B-tree indexes are built for point lookups and small writes, not the wide aggregations OLAP workloads run. Here is how each Postgres-for-analytics extension actually performs, and when CDC to a columnar engine is the right call.

    Last updated: May 27, 2026

  • The fastest OLAP databases in 2026 (ranked by ClickBench)

    The fastest OLAP engines deliver sub-second median query latency on a 100 GB analytical dataset. Here are the OLAP databases ranked by their results on ClickBench, the public benchmark for analytical query latency.

    Last updated: May 27, 2026

  • OLTP vs OLAP

    OLTP handles transactional reads and writes in milliseconds; OLAP scans billions of rows for analytics in sub-second time. Architectures, decision rules, and where the line blurs.

    Last updated: May 27, 2026

  • What is a columnar database?

    A columnar database stores each column on disk separately to cut I/O and compress aggressively. Here is what that means in practice and when to use one.

    Last updated: May 25, 2026

  • Database compression: encodings, codecs and ratios

    Learn how databases compress data, the trade-offs between row and column storage, and why ClickHouse delivers industry-leading compression ratios.

    Last updated: May 6, 2026

  • 7 things to consider when choosing a cloud data warehouse

    This guide examines seven critical factors that actually matter when evaluating data warehouses for production use. We'll compare how ClickHouse, Snowflake, BigQuery, and Redshift handle each consideration, drawing from documented capabilities, architectural designs, and real-world customer experiences.

    Last updated: Nov 18, 2025

  • Open table formats

    In this guide, we'll explore the Iceberg, Delta Lake, and Hudi open table formats.

    Last updated: May 9, 2025

  • Data catalog

    In this guide, we'll explore data catalogs for open table formats like Iceberg, Delta Lake, and Hudi, explaining how these metadata systems make modern data lakes more powerful and accessible.

    Last updated: May 9, 2025

  • Apache Iceberg

    Apache Iceberg transforms data lakes into robust lakehouse architectures with its high-performance table format. This article explores Iceberg's origins, key features like ACID transactions and schema evolution, and demonstrates how to query Iceberg tables in ClickHouse using both direct and catalog-based approaches.

    Last updated: May 21, 2025

  • Top 5 cloud data warehouses in 2026: Architecture, cost, and open-source

    Compare the top cloud data warehouses of 2025: Snowflake, BigQuery, Redshift, Databricks, and ClickHouse. Analyze cost efficiency, real-time performance, and open-source architecture.

    Last updated: Apr 13, 2026

  • What is a data lakehouse?

    The data lakehouse combines the best of data warehouses and data lakes into a unified architecture. We'll explore its key components, advantages, and how ClickHouse fits into this modern analytics platform.

    Last updated: Apr 17, 2026

  • What is Real-Time Analytics? A Complete Guide (2026)

    Discover what real-time analytics is, its key benefits, and its use cases, such as fraud detection. Learn how it differs from batch and what to look for in a database.

    Last updated: Apr 14, 2026

  • Best real-time analytics database for star schema and fast joins (2026 guide)

    Compare real-time analytics databases for star schema and fast cross-table joins. Learn what to test in 2026 and choose the right engine for your workload.

    Last updated: Apr 8, 2026

  • Managed Postgres for AI and Real-Time Apps

    Learn how managed Postgres on NVMe with native ClickHouse CDC and pg_clickhouse eliminates the ETL tax and scales from OLTP to real-time analytics and AI apps

    Last updated: Apr 7, 2026

  • The ultimate guide to Open Source Observability in 2026: From silos to stacks

    Explore the top open source observability stacks for 2025. Compare ELK, LGTM, and unified observability solutions like ClickStack for cost, scale, and high-cardinality data.

    Last updated: Nov 10, 2025

  • A practical guide to observability TCO and cost reduction

    Tired of exploding observability bills? Learn how to calculate your true TCO and see how to cut architecture costs by 70-90% vs. ingest-based pricing.

    Last updated: Nov 26, 2025

  • How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

    A technical guide for engineers to build a cost-efficient observability stack. Learn how to use ZSTD codecs, Materialized Views, and Tiered Storage with ClickHouse to reduce observability data footprint by 10x.

    Last updated: Jan 7, 2026

  • Mastering Kubernetes observability: a guide to monitoring modern architectures

    Learn why traditional Kubernetes monitoring fails at scale and how to master observability with a modern cost-effective database-first approach.

    Aditya Somani • Last updated: Jan 21, 2026

  • What is observability in 2026? Why it's an analytics problem and why your database matters.

    Observability in 2026 is an analytics problem. Learn why the traditional "three pillars" model fails and how a unified ClickHouse database solves cost and latency issues.

    Aditya Somani • Last updated: Jan 7, 2026

  • 8 top New Relic alternatives to break the pricing trap (2026 guide)

    Escape New Relic's pricing trap and vendor lock-in. Our 2026 guide compares the top 8 observability alternatives—including Datadog, Grafana, and ClickHouse—for pricing, data ownership, and SQL support.

    Aditya Somani • Last updated: Dec 15, 2025

  • The 8 best Datadog alternatives for observability at scale in 2026

    Exploring Datadog alternatives? We compare the top competitors for cost-efficiency, performance at scale, and handling high-cardinality data. Find the right fit.

    Aditya Somani • Last updated: Dec 8, 2025

  • Beyond sampling: managing petabyte-scale logs without losing data

    Stop compromising on visibility. Learn how to manage petabyte-scale logs without sampling using ClickHouse’s cost-efficient, columnar architecture.

    Aditya Somani • Last updated: Nov 27, 2025

  • The high-cardinality trap: why your observability platform is failing (and what to do about it)

    Tired of slow queries & dropped fields? Learn why traditional observability tools fail with high-cardinality data & how a columnar database like ClickHouse solves it.

    Aditya Somani • Last updated: Nov 21, 2025

  • Top 15 infrastructure monitoring tools in 2026: a performance and cost-based comparison

    A complete guide to the best infrastructure monitoring tools. We compare 15 top solutions on features, performance, high-cardinality handling, and cost at scale.

    Aditya Somani • Last updated: Nov 17, 2025

  • Best practices for storing OpenTelemetry Collector data

    What are the best practices for storing your OTel Collector data?

    Last updated: Nov 10, 2025

  • The definitive guide to ClickHouse query optimization (2026)

    Master ClickHouse query optimization through architectural understanding. Learn why ORDER BY design can improve performance 100×, plus proven techniques for trillion-row millisecond queries.

    Al Brown, Tom Schreiber, Lionel Palacin • Last updated: Jan 26, 2026

  • Top 10 OpenTelemetry Compatible Platforms for 2025

    Explore the top 10 OpenTelemetry compatible platforms for 2025. Learn how to choose the right backend based on performance, cost at scale, and analytics.

    Aditya Somani • Last updated: Nov 6, 2025

  • MCP and Data Warehouses: everything you need to know

    This article explores the suitability of MCP with Data Warehouses, and discusses the business and technical details you need to know to succeed.

    Al Brown • Last updated: Nov 29, 2025

  • What Is a Time-Series Database? Examples, Use Cases & ClickHouse Guide

    A time-series database (TSDB) stores and queries data indexed by time, including metrics, events, and logs. Learn real-world examples, common architectures, and when ClickHouse fits time-series workloads.

    Last updated: Nov 18, 2025

  • Real-time data visualization

    This guide is all about real-time data visualization. We'll explore how it differs from normal visualization, see some examples, and learn about the tools we can use.

    Last updated: Apr 11, 2025

  • What is a JSON database?

    In this guide, we'll learn about JSON, the types of databases that can store JSON, and how to work with JSON data in ClickHouse.

    Last updated: Apr 11, 2025

  • What is a data application?

    In this guide, we'll learn all about data applications - what are they, what are the main components, and why would you want to create one?

    Last updated: Apr 11, 2025

  • Avro vs Parquet

    In this guide, we'll learn all about the Apache Avro and Apache Parquet big data formats.

    Last updated: Apr 11, 2025

  • Structured, unstructured, and semi-structured data

    In this guide, we explore the three main forms of data: structured data with rigid schemas like database tables, unstructured data like text and images with no predefined format, and semi-structured data like JSON that combines elements of both while maintaining flexibility.

    Last updated: Apr 11, 2025

  • Build a dashboard in Python with ClickHouse and Streamlit

    In this guide, you'll learn how to build a Python dashboard using ClickHouse and Streamlit. We'll create a real-world example that visualizes Bluesky social media data, walking through everything from basic setup to interactive visualizations. Perfect for data scientists and analysts who want to share their insights through custom dashboards.

    Last updated: Jun 2, 2025

  • Log monitoring

    Discover the fundamentals of log monitoring systems, exploring different log types, monitoring techniques, and modern tools, with practical insights into how organizations leverage solutions like ClickHouse to manage massive log volumes efficiently and cost-effectively.

    Last updated: Apr 11, 2025

  • An intro to OpenTelemetry (OTel)

    In this guide, we’ll explore OpenTelemetry (OTel), a framework for collecting and standardizing telemetry data—metrics, logs, and traces—enhancing observability and performance monitoring in modern software systems.

    Last updated: Apr 11, 2025

  • Telemetry data explained

    In this guide, we'll explore telemetry data - the vital information that helps us understand, monitor, and improve our software systems through the collection of metrics, logs, and traces.

    Last updated: Apr 11, 2025

  • Observability

    In this guide, we'll explore observability - the practice of understanding a system's internal state through its outputs, and how modern approaches are helping organizations gain deeper insights into their systems' behavior.

    Last updated: Jun 25, 2025

  • Security Information and Event Management (SIEM)

    In this guide, we'll explore SIEM (Security Information and Event Management) - the central security system that collects, analyzes, and responds to security threats across your organization's entire infrastructure.

    Last updated: Apr 11, 2025

  • Understanding LLM Observability

    In this guide, we'll explore how teams monitor and debug their LLM applications, helping them understand everything from response accuracy and token usage to the complex reasoning chains of AI agents.

    Last updated: Oct 28, 2025

  • Application Performance Monitoring (APM)

    In this guide, we'll explore Application Performance Monitoring (APM) - the practice of tracking and analyzing application behavior in real-time to ensure optimal performance and user experience.

    Last updated: Apr 11, 2025

  • Network monitoring

    In this guide, we'll explore how organizations implement network monitoring to gain visibility, troubleshoot issues, and ensure optimal performance across distributed infrastructures.

    Last updated: Apr 11, 2025

  • Structured logging

    In this guide, we'll explore how structured logging transforms traditional text-based logs into queryable data, enabling organizations to build powerful monitoring, analysis, and automation capabilities at scale.

    Last updated: Apr 11, 2025

  • Top 5 Splunk Alternatives in 2025

    What are the best alternatives to Splunk?

    Last updated: Oct 29, 2025

  • Instrumenting OpenAI with OpenTelemetry (OTel)

    In this guide, we’ll learn how to instrument the OpenAI client with OpenTelemetry (OTel) so that we can generate and collect observability data about our LLM calls.

    Last updated: Aug 1, 2025

  • Setting up Apache Iceberg locally using PySpark

    In this guide, we'll learn how to set up Apache Iceberg locally using PySpark.

    Mark Needham • Last updated: Aug 5, 2025

  • Tracing LangChain apps with OpenLLMetry

    In this guide, we'll learn about OpenLLMetry - the open-source observability framework for large language models.

    Mark Needham • Last updated: Aug 29, 2025