To convert BSON to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the file and write CSV out:
clickhouse local -q "SELECT * FROM file('events.bson') INTO OUTFILE 'events.csv' FORMAT CSVWithNames"
"event_id","event_type","geo","amount"
1,"signup","{'city':'London','country':'GB'}",5
2,"purchase","{'city':'Berlin','country':'DE'}",6.01
3,"login","{'city':'Paris','country':'FR'}",7.02
ClickHouse reads the BSON documents in place, infers the schema from the field names and types each document carries, and streams CSV out without an import step or an upload. Files larger than RAM convert the same way.
That works, but look at the geo column. It is a whole sub-document crammed into one CSV cell. BSON nests; CSV is flat. The interesting part of this conversion is what happens to nested fields, so let's handle that properly.
BSON (Binary JSON) is the format MongoDB stores and mongoexport emits. Each document carries its own field names and types, so clickhouse-local can read the structure without you describing it. Check what it found with DESCRIBE:
clickhouse local -q "DESCRIBE file('events.bson')"
event_id Nullable(Int64)
event_type Nullable(String)
geo Map(String, Nullable(String))
amount Nullable(Float64)
Scalar fields map cleanly: an integer becomes Int64, a string becomes String, a double becomes Float64. The nested geo sub-document becomes a Map(String, ...). That is correct and lossless inside ClickHouse, but CSV has no concept of a map. When you write it out, the whole map is serialized into a single quoted text cell, as you saw above.
Flatten nested sub-documents into real columns
A flat CSV with one value per column is almost always what you want downstream. Pull the keys you care about out of the map and project them as their own columns in the SELECT:
clickhouse local -q "
SELECT
event_id,
event_type,
geo['city'] AS city,
geo['country'] AS country,
amount
FROM file('events.bson')
INTO OUTFILE 'events_flat.csv' FORMAT CSVWithNames"
"event_id","event_type","city","country","amount"
1,"signup","London","GB",5
2,"purchase","Berlin","DE",6.01
3,"login","Paris","FR",7.02
4,"refund","Austin","US",8.03
5,"signup","London","GB",9.04
city and country are now first-class columns. The flat file reads back with a clean, fully typed schema, ready for any tool that speaks CSV:
event_id Nullable(Int64)
event_type Nullable(String)
city Nullable(String)
country Nullable(String)
amount Nullable(Float64)
The same approach handles arrays inside a document: index them (tags[1]), join them into a string (arrayStringConcat(tags, '|')), or arrayJoin them to fan one document out into multiple CSV rows. You decide how the nesting collapses, instead of getting a serialized blob you have to re-parse later.
This is where a scriptable converter beats a drag-and-drop website. You control the output, not a fixed template.
FORMAT CSVWithNames gives comma-separated output with a header row. Drop the header with FORMAT CSV, switch to tab-separated with FORMAT TSVWithNames, or override the separator with settings like format_csv_delimiter when a downstream tool is fussy.
- Because it is one SQL statement, a
WHERE event_type = 'purchase', a GROUP BY, or a renamed column costs nothing extra. You filter and reshape during the conversion rather than cleaning up in a second pass.
- Inference reads a sample of documents. If a field that should stay a string (a ZIP code, a leading-zero id) gets read as a number, pass an explicit structure:
file('events.bson', 'BSONEachRow', 'event_id String, ...').
- The conversion streams documents through rather than building the whole dataset in memory, so a BSON export bigger than your RAM still converts.
- An
events.bson.gz input is decompressed on the fly from the extension, and an events.csv.gz output is gzipped on the way out, with no separate step.
Reversing the conversion is the same idea in the other direction: SELECT from the CSV and write FORMAT BSONEachRow.
On a 1,400,000-row BSON export (events_large.bson, ~145 MB), flattening the nested geo document and writing the full CSV completes in:
clickhouse local -q "
SELECT event_id, event_type, geo['city'] AS city, geo['country'] AS country, amount
FROM file('events_large.bson')
INTO OUTFILE 'events_large.csv' FORMAT CSVWithNames"
~1.64 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That parses every BSON document and writes 1.4M CSV rows from scratch; there is no cached table. An upload-required website would still be uploading.
If you live in Python, chDB is the same ClickHouse engine in-process. The SQL is identical, so the conversion and the flatten work exactly as above:
import chdb
chdb.query("""
SELECT
event_id,
event_type,
geo['city'] AS city,
geo['country'] AS country,
amount
FROM file('events.bson')
INTO OUTFILE 'events_flat.csv' FORMAT CSVWithNames
""")
"event_id","event_type","city","country","amount"
1,"signup","London","GB",5
2,"purchase","Berlin","DE",6.01
3,"login","Paris","FR",7.02
To keep the result in memory instead of on disk, ask chDB for a DataFrame: chdb.query(sql, "DataFrame") (see reading a BSON file in Python with chDB).
The complete, runnable example lives in the ClickHouse examples repo: generate.sh creates the sample BSON files (including the ~145 MB file used for the timing above), run.sh and run.py run every command on this page, and expected_output.txt is the captured output.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-bson-to-csv
The same SQL that converts a file on your laptop runs unchanged against a directory of files, a remote object store, or a ClickHouse server and ClickHouse Cloud when the data outgrows one machine. To query the BSON in place instead of converting it, see how to query a BSON file; to keep the data as documents, see convert BSON to JSON; and for the CSV side, how to run SQL on a CSV file.