To query a BSON file, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the file with the file() function:
clickhouse local -q "SELECT * FROM file('events.bson') LIMIT 10"
BSON is MongoDB's binary JSON, the format mongoexport writes, and it carries its own field names and types. ClickHouse reads it as BSONEachRow and infers the schema straight from the file, so you query it immediately with no import step.
A BSON document records the name and type of every field, so you never write CREATE TABLE. DESCRIBE prints what ClickHouse read out of the file:
clickhouse local -q "DESCRIBE file('events.bson')"
_id Nullable(Int64)
user Nullable(String)
event Nullable(String)
amount Nullable(Float64)
geo Tuple(country Nullable(String), sessions Nullable(Int32))
Scalars map to the obvious ClickHouse types. The interesting one is geo: it was a nested sub-document in the BSON, and it comes back as a Tuple with its keys preserved as named fields. Columns are Nullable because a document collection can always omit a field.
MongoDB documents nest. That's the whole point of them, and it's where flattening a BSON export to CSV first loses you information. Here ClickHouse kept the structure, so you address a sub-document field by key with dot notation (geo.country), no unpacking step:
clickhouse local -q "
SELECT _id, user, event, geo.country AS country, geo.sessions AS sessions
FROM file('events.bson')
LIMIT 5
FORMAT PrettyCompact"
┌─_id─┬─user──┬─event────┬─country─┬─sessions─┐
1. │ 0 │ alice │ view │ GB │ 1 │
2. │ 1 │ bob │ click │ US │ 2 │
3. │ 2 │ carol │ purchase │ DE │ 3 │
4. │ 3 │ dave │ refund │ FR │ 1 │
5. │ 4 │ alice │ view │ IN │ 2 │
└─────┴───────┴──────────┴─────────┴──────────┘
A nested field reads just like a top-level one. (If a sub-document holds values that are all the same type, BSON serialises it as an array and ClickHouse infers Array(...) instead of a named Tuple; you'd then index it positionally as geo[1]. Mixed-type sub-documents keep their keys.)
Filter, group and aggregate directly on the BSON
The file is a SQL source, so the full ClickHouse dialect works on it (WHERE, GROUP BY, aggregates, joins) with no load step in between, and the nested key works inside the aggregation too:
clickhouse local -q "
SELECT geo.country AS country,
count() AS events,
round(sum(amount), 2) AS amount
FROM file('events.bson')
WHERE event = 'purchase'
GROUP BY country
ORDER BY amount DESC
FORMAT PrettyCompact"
┌─country─┬─events─┬─amount─┐
1. │ FR │ 1 │ 23.18 │
2. │ IN │ 1 │ 19.14 │
3. │ GB │ 1 │ 15.1 │
4. │ US │ 1 │ 11.06 │
5. │ DE │ 1 │ 7.02 │
└─────────┴────────┴────────┘
Convert BSON to a columnar file once
BSON is a row format built for document storage and transport, not for repeated analytical scans: every query walks each document field by field. If you'll query the same export more than once, convert it to Parquet once and read that instead. It's a single command — SELECT from the BSON, INTO OUTFILE as Parquet:
clickhouse local -q "SELECT * FROM file('events.bson') INTO OUTFILE 'events.parquet' TRUNCATE FORMAT Parquet"
The nested geo tuple maps cleanly to a Parquet struct, so the structure survives the round-trip. Read it back with the same file() call. See how to query a Parquet file and the convert BSON to JSON guide for other targets.
The conversion you skipped was the slow part. On a 1,300,000-row, ~140 MB BSON file (events_large.bson), a filter-and-group-by over every document completes in:
clickhouse local --time -q "
SELECT geo.country, count(), round(sum(amount), 2)
FROM file('events_large.bson')
WHERE event = 'purchase'
GROUP BY geo.country ORDER BY 3 DESC
FORMAT Null"
~0.14 seconds of query execution (--time prints it), best of three with the file warm in the OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM; clickhouse local 26.6.1.117). That decodes every BSON document from scratch on each run; there's no cached table. Measured end to end including process startup, run.sh reports wall-clock at about 0.28s. Convert the same data to Parquet and the scan gets faster again, because Parquet is columnar and reads only the fields a query touches.
The query you just ran on a laptop file is the same SQL you would run on a ClickHouse server or in ClickHouse Cloud. Nothing about SELECT ... WHERE ... GROUP BY changes. You swap file('events.bson') for a table name and the rest stays put. Prototype against a mongoexport dump on your machine, ship the identical logic to production.
The complete, runnable example lives in the ClickHouse examples repo. It has generate.sh (builds the demo document and the ~140 MB perf file), run.sh (every command above), and expected_output.txt:
github.com/ClickHouse/examples → local-analytics/clickhouse-local-bson
git clone https://github.com/ClickHouse/examples
cd examples/local-analytics/clickhouse-local-bson
./generate.sh && ./run.sh