To convert a CSV file to JSON, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then select from the CSV and write the result with FORMAT JSONEachRow:
clickhouse local -q "SELECT * FROM file('orders.csv') INTO OUTFILE 'orders.jsonl' FORMAT JSONEachRow"
{"order_date":"2026-01-01","order_id":1,"country":"GB","product":"widget","revenue":5,"quantity":1}
{"order_date":"2026-01-02","order_id":2,"country":"US","product":"gadget","revenue":6.01,"quantity":2}
{"order_date":"2026-01-03","order_id":3,"country":"DE","product":"gizmo","revenue":7.02,"quantity":3}
The CSV is read in place with no import step: the header supplies column names, the data supplies types, and those types carry straight into the JSON values.
This is the choice that trips people up. "JSON" can mean two different file shapes, and CSV-to-JSON converters often pick one without telling you. ClickHouse exposes both, so you pick.
One object per line (NDJSON, also called JSON Lines) uses FORMAT JSONEachRow. Each row is an independent JSON object on its own line. There is no enclosing array, so a reader never has to hold the whole file in memory:
clickhouse local -q "SELECT * FROM file('orders.csv') INTO OUTFILE 'orders.jsonl' FORMAT JSONEachRow"
A single JSON array of objects uses FORMAT JSON. ClickHouse emits one document: a data array of row objects, plus a meta block describing the columns and a statistics footer.
clickhouse local -q "SELECT * FROM file('orders.csv') FORMAT JSON" > orders.json
{
"meta":
[
{
"name": "order_date",
"type": "Nullable(Date)"
},
...
],
"data":
[
{
"order_date": "2026-01-01",
"order_id": 1,
"country": "GB",
"product": "widget",
"revenue": 5,
"quantity": 1
},
...
],
"rows": 2,
...
}
Use JSONEachRow for anything streaming or large, and for APIs that ingest line-delimited records. Use JSON when a downstream tool expects one self-describing document. If you want a bare array of arrays with no metadata wrapper, FORMAT JSONCompactEachRow writes one array per row:
["2026-01-01", 1, "GB", "widget", 5, 1]
["2026-01-02", 2, "US", "gadget", 6.01, 2]
For more on the line-delimited shape, see what is NDJSON.
This is the part a generic converter usually gets wrong: it reads every CSV cell as text and writes every JSON value as a quoted string. clickhouse-local infers the type of each column first, then writes the matching JSON type. Check what it inferred with DESCRIBE:
clickhouse local -q "DESCRIBE file('orders.csv')"
order_date Nullable(Date)
order_id Nullable(Int64)
country Nullable(String)
product Nullable(String)
revenue Nullable(Float64)
quantity Nullable(Int64)
In the resulting JSON, order_id and quantity are bare integers and revenue is a float. No quotes, no manual cleanup downstream:
{"order_date":"2026-01-01","order_id":1,"country":"GB","product":"widget","revenue":5,"quantity":1}
A web-based converter takes a file, returns a file, and gives you no control over the two things that actually bite you. From the command line you control both.
Keep a column as a string when inference guesses wrong. A zero-padded ID like 007 or a US ZIP code looks numeric, and an automatic converter will silently turn it into 7, dropping the leading zeros. Pin the column type by passing the format and an explicit schema as the second and third arguments to file():
clickhouse local -q "
SELECT * FROM file('orders.csv', 'CSVWithNames',
'order_date Date, order_id String, country String, product String, revenue Float64, quantity UInt8')
LIMIT 1 FORMAT JSONEachRow"
{"order_date":"2026-01-01","order_id":"1","country":"GB","product":"widget","revenue":5,"quantity":1}
order_id is now a quoted string, so a value like "007" survives intact.
Transform on the way out. Because the CSV is a table, you can filter, rename, compute, and reshape in the same SELECT that does the conversion. Rename columns with AS, drop ones you do not need, add a computed field, or WHERE out the rows you do not want. The conversion and the cleanup are one pass.
Going the other way, from JSON back to CSV, is the mirror image: SELECT from the JSON file and write FORMAT CSVWithNames. See how to convert JSON to CSV.
If your pipeline is Python, chDB is the same engine in-process. Run the same SQL with the same FORMAT and write to a file:
import chdb
# One object per line (NDJSON).
chdb.query(
"SELECT * FROM file('orders.csv') "
"INTO OUTFILE 'orders.jsonl' TRUNCATE FORMAT JSONEachRow"
)
# Or capture a single JSON array document as a string.
doc = chdb.query("SELECT * FROM file('orders.csv') FORMAT JSON")
For reading JSON into a DataFrame from Python, see how to read a JSON file in Python with chDB.
On a 2,000,000-row, ~84 MB CSV (orders_large.csv), converting the whole file to line-delimited JSON:
clickhouse local -q "SELECT * FROM file('orders_large.csv') INTO OUTFILE 'orders_large.jsonl' FORMAT JSONEachRow"
~0.29 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). That parses every CSV row and writes 2,000,000 JSON objects, no cached table involved. The JSON output is larger than the CSV (here ~222 MB vs ~84 MB) because every field repeats its key on every row; pipe it through a .gz filename to compress on the fly.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample CSVs (including the ~84 MB file used for the timing), run.sh with every command on this page, a chDB run.py / run.ipynb, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-csv-to-json
The same SQL scales from a file on your laptop to a ClickHouse server to ClickHouse Cloud with no rewrite. Want to query the CSV in place rather than convert it? See how to run SQL on a CSV file.