To convert BSON to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the file and write back out as JSON:
clickhouse local -q "SELECT * FROM file('users.bson') INTO OUTFILE 'users.jsonl' FORMAT JSONEachRow"
{"_id":1,"name":"Ada","city":"NYC","address":{"street":"1 Market St","zip":"00000"},"tags":["mongo"],"active":false}
{"_id":2,"name":"Lin","city":"SF","address":{"street":"8 Market St","zip":"00137"},"tags":["mongo","etl"],"active":true}
{"_id":3,"name":"Tor","city":"LDN","address":{"street":"15 Market St","zip":"00274"},"tags":["mongo","etl","beta"],"active":false}
That .bson file is the shape MongoDB writes: mongodump produces one *.bson per collection, and mongoexport --type=bson does the same. ClickHouse reads it with the BSONEachRow format, where each row is one BSON document, and the file is read in place with no import step first.
You don't supply a schema. BSONEachRow reads the type tags inside each BSON document, so a 64-bit int stays a number, a boolean stays a boolean, an embedded document becomes a nested JSON object, and a BSON array becomes a JSON array. Check what was detected with DESCRIBE:
clickhouse local -q "DESCRIBE file('users.bson')"
_id Nullable(Int64)
name Nullable(String)
city Nullable(String)
address Map(String, Nullable(String))
tags Array(Nullable(String))
active Nullable(Bool)
The embedded address document came through as a Map, the tags array as an Array, and active as a real Bool. JSONEachRow writes them back as a nested object, a JSON array, and true/false — not as strings. This is the part upload-based converter sites get wrong: numbers and booleans arrive quoted, and nested documents are flattened or dropped. Here they survive the round trip.
If you want a single JSON array with a metadata header instead of line-delimited objects, use FORMAT JSON:
clickhouse local -q "SELECT * FROM file('users.bson') INTO OUTFILE 'users.json' FORMAT JSON"
{
"meta":
[
{ "name": "_id", "type": "Nullable(Int64)" },
...
],
"data":
[
{
"_id": 1,
"name": "Ada",
"city": "NYC",
"address": { "street": "1 Market St", "zip": "00000" },
"tags": ["mongo"],
"active": false
},
...
]
}
JSONEachRow (NDJSON) is the right choice for piping into another tool or loading row by row; JSON is the right choice when something downstream wants one array. Both come from the same SELECT, only the FORMAT changes.
Because the BSON file is just a table, you can filter and flatten in the same pass rather than converting everything and cleaning up afterwards. Here we keep only active users and lift the nested zip out of address:
clickhouse local -q "
SELECT _id, name, address['zip'] AS zip, tags
FROM file('users.bson')
WHERE active
INTO OUTFILE 'active.jsonl' FORMAT JSONEachRow"
{"_id":2,"name":"Lin","zip":"00137","tags":["mongo","etl"]}
{"_id":4,"name":"Mae","zip":"00411","tags":["mongo"]}
{"_id":6,"name":"Yuki","zip":"00685","tags":["mongo","etl","beta"]}
The full SQL surface works here — WHERE, GROUP BY, JOIN, window functions — so a conversion and a transformation are one command.
- Output target. Drop
INTO OUTFILE to stream the JSON to stdout (pipe it into jq, head, or another process). Add it to write a file; TRUNCATE overwrites an existing one.
- Compress on the way out. A
.gz, .zst, or .xz suffix on the output path is compressed automatically: INTO OUTFILE 'users.jsonl.zst'. No flag needed, and reading it back is just as transparent.
- NDJSON vs array.
JSONEachRow → .jsonl/.ndjson (one object per line, streamable). JSON → a single array with a typed meta header.
- Querying instead of converting. If you only need to read the BSON, skip the output file entirely. See how to query a BSON file.
On a 2,000,000-document, ~162 MB events.bson, converting the whole file to NDJSON:
clickhouse local -q "SELECT * FROM file('events.bson') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"
~0.29 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That is parse-BSON-and-write-JSON end to end, no cached table. The result is 2,000,000 lines of NDJSON. A browser upload converter would still be uploading the file.
clickhouse-local reads the .bson directly.
If you live in Python, chDB is the same ClickHouse engine in-process. The SQL is identical — SELECT ... INTO OUTFILE ... FORMAT JSONEachRow — you just call it through chdb.query:
import chdb
chdb.query(
"SELECT * FROM file('users.bson') "
"INTO OUTFILE 'users.jsonl' TRUNCATE FORMAT JSONEachRow"
)
print(open("users.jsonl").read())
{"_id":1,"name":"Ada","city":"NYC","address":{"street":"1 Market St","zip":"00000"},"tags":["mongo"],"active":false}
{"_id":2,"name":"Lin","city":"SF","address":{"street":"8 Market St","zip":"00137"},"tags":["mongo","etl"],"active":true}
...
Same schema inference, same nested-document handling, no server. For reading BSON straight into a DataFrame, see how to read a BSON file in Python.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample BSON (including the ~162 MB file used for the timing above), run.sh with every command on this page, run.py/run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-bson-to-json
The same SQL scales from a file on your laptop to a ClickHouse server to ClickHouse Cloud — when a one-off mongodump grows into a recurring pipeline, nothing about the query changes.