How to convert BSON to JSON

To convert BSON to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the file and write back out as JSON:

1clickhouse local -q "SELECT * FROM file('users.bson') INTO OUTFILE 'users.jsonl' FORMAT JSONEachRow"

1{"_id":1,"name":"Ada","city":"NYC","address":{"street":"1 Market St","zip":"00000"},"tags":["mongo"],"active":false}
2{"_id":2,"name":"Lin","city":"SF","address":{"street":"8 Market St","zip":"00137"},"tags":["mongo","etl"],"active":true}
3{"_id":3,"name":"Tor","city":"LDN","address":{"street":"15 Market St","zip":"00274"},"tags":["mongo","etl","beta"],"active":false}

That .bson file is the shape MongoDB writes: mongodump produces one *.bson per collection, and mongoexport --type=bson does the same. ClickHouse reads it with the BSONEachRow format, where each row is one BSON document, and the file is read in place with no import step first.

The schema is inferred, types are carried into JSON

You don't supply a schema. BSONEachRow reads the type tags inside each BSON document, so a 64-bit int stays a number, a boolean stays a boolean, an embedded document becomes a nested JSON object, and a BSON array becomes a JSON array. Check what was detected with DESCRIBE:

1clickhouse local -q "DESCRIBE file('users.bson')"

1_id      Nullable(Int64)
2name     Nullable(String)
3city     Nullable(String)
4address  Map(String, Nullable(String))
5tags     Array(Nullable(String))
6active   Nullable(Bool)

The embedded address document came through as a Map, the tags array as an Array, and active as a real Bool. JSONEachRow writes them back as a nested object, a JSON array, and true/false — not as strings. This is the part upload-based converter sites get wrong: numbers and booleans arrive quoted, and nested documents are flattened or dropped. Here they survive the round trip.

If you want a single JSON array with a metadata header instead of line-delimited objects, use FORMAT JSON:

1clickhouse local -q "SELECT * FROM file('users.bson') INTO OUTFILE 'users.json' FORMAT JSON"

1{
2	"meta":
3	[
4		{ "name": "_id", "type": "Nullable(Int64)" },
5		...
6	],
7	"data":
8	[
9		{
10			"_id": 1,
11			"name": "Ada",
12			"city": "NYC",
13			"address": { "street": "1 Market St", "zip": "00000" },
14			"tags": ["mongo"],
15			"active": false
16		},
17		...
18	]
19}

JSONEachRow (NDJSON) is the right choice for piping into another tool or loading row by row; JSON is the right choice when something downstream wants one array. Both come from the same SELECT, only the FORMAT changes.

Reshape while you convert

Because the BSON file is just a table, you can filter and flatten in the same pass rather than converting everything and cleaning up afterwards. Here we keep only active users and lift the nested zip out of address:

1clickhouse local -q "
2SELECT _id, name, address['zip'] AS zip, tags
3FROM file('users.bson')
4WHERE active
5INTO OUTFILE 'active.jsonl' FORMAT JSONEachRow"

1{"_id":2,"name":"Lin","zip":"00137","tags":["mongo","etl"]}
2{"_id":4,"name":"Mae","zip":"00411","tags":["mongo"]}
3{"_id":6,"name":"Yuki","zip":"00685","tags":["mongo","etl","beta"]}

The full SQL surface works here — WHERE, GROUP BY, JOIN, window functions — so a conversion and a transformation are one command.

Options worth knowing

Output target. Drop INTO OUTFILE to stream the JSON to stdout (pipe it into jq, head, or another process). Add it to write a file; TRUNCATE overwrites an existing one.
Compress on the way out. A .gz, .zst, or .xz suffix on the output path is compressed automatically: INTO OUTFILE 'users.jsonl.zst'. No flag needed, and reading it back is just as transparent.
NDJSON vs array. JSONEachRow → .jsonl/.ndjson (one object per line, streamable). JSON → a single array with a typed meta header.
Querying instead of converting. If you only need to read the BSON, skip the output file entirely. See how to query a BSON file.

How fast is it?

On a 2,000,000-document, ~162 MB events.bson, converting the whole file to NDJSON:

1clickhouse local -q "SELECT * FROM file('events.bson') INTO OUTFILE 'events.jsonl' FORMAT JSONEachRow"

~0.29 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That is parse-BSON-and-write-JSON end to end, no cached table. The result is 2,000,000 lines of NDJSON. A browser upload converter would still be uploading the file.

clickhouse-local reads the .bson directly.

chDB: the same conversion in Python

If you live in Python, chDB is the same ClickHouse engine in-process. The SQL is identical — SELECT ... INTO OUTFILE ... FORMAT JSONEachRow — you just call it through chdb.query:

1import chdb
2
3chdb.query(
4    "SELECT * FROM file('users.bson') "
5    "INTO OUTFILE 'users.jsonl' TRUNCATE FORMAT JSONEachRow"
6)
7print(open("users.jsonl").read())

1{"_id":1,"name":"Ada","city":"NYC","address":{"street":"1 Market St","zip":"00000"},"tags":["mongo"],"active":false}
2{"_id":2,"name":"Lin","city":"SF","address":{"street":"8 Market St","zip":"00137"},"tags":["mongo","etl"],"active":true}
3...

Same schema inference, same nested-document handling, no server. For reading BSON straight into a DataFrame, see how to read a BSON file in Python.

Run it yourself

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample BSON (including the ~162 MB file used for the timing above), run.sh with every command on this page, run.py/run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-bson-to-json

New to the format? What is BSON? and what is NDJSON?
Want CSV instead of JSON? Convert BSON to CSV (note: nested documents flatten to strings in CSV — JSON keeps the structure).
Just reading the data? Query a BSON file and run SQL on a JSONL file.

The same SQL scales from a file on your laptop to a ClickHouse server to ClickHouse Cloud — when a one-off mongodump grows into a recurring pipeline, nothing about the query changes.

The schema is inferred, types are carried into JSON

Reshape while you convert

Options worth knowing

How fast is it?

chDB: the same conversion in Python

Run it yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

How to convert BSON to JSON

The schema is inferred, types are carried into JSON #

Reshape while you convert #

Options worth knowing #

How fast is it? #

chDB: the same conversion in Python #

Run it yourself #

Related #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

The schema is inferred, types are carried into JSON

Reshape while you convert

Options worth knowing

How fast is it?

chDB: the same conversion in Python

Run it yourself

Related