To convert a .npy file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.
Install it with clickhousectl:
curl https://clickhouse.com/cli | sh # install clickhousectl
clickhousectl local use latest # download ClickHouse and put it on your PATH
Then point it at the file and write the result out as CSV:
clickhouse local -q "SELECT * FROM file('signal.npy') INTO OUTFILE 'signal.csv' FORMAT CSVWithNames"
"array"
0
47.9426
84.1471
99.7495
90.9297
59.8472
14.112
clickhouse local reads the dtype from the .npy header directly, so a numeric array converts straight to CSV with no import step. The file streams through, so conversions larger than RAM complete without loading the whole array at once.
NPY is numeric, and it carries its own dtype
The .npy format stores one NumPy array per file, and only numbers — no strings, no dates, no nested objects. The good news is the array's dtype is written into the file header, so you never supply a schema. DESCRIBE shows what was read:
clickhouse local -q "DESCRIBE file('signal.npy')"
A one-dimensional float64 array becomes a single Float64 column named array, one element per row. An int32 array becomes Int32, and so on. The type carries through to the CSV unchanged, so there is no lossy cast on a plain 1D conversion.
The gotcha: a 2D array is one Array column
This is where .npy differs from a flat CSV. A two-dimensional array doesn't arrive as several columns. It arrives as a single column whose type is an array:
clickhouse local -q "DESCRIBE file('matrix.npy')"
clickhouse local -q "SELECT * FROM file('matrix.npy')"
array Array(Int32)
[0,1,2]
[3,4,5]
[6,7,8]
[9,10,11]
[12,13,14]
If you convert that as-is, each row of CSV holds one quoted list, which is rarely what you want in a spreadsheet or another tool:
clickhouse local -q "SELECT * FROM file('matrix.npy') INTO OUTFILE 'matrix_naive.csv' FORMAT CSV"
"[0,1,2]"
"[3,4,5]"
"[6,7,8]"
To get real, separate CSV columns, index into the array (ClickHouse arrays are 1-based) and name each element:
clickhouse local -q "
SELECT array[1] AS c0, array[2] AS c1, array[3] AS c2
FROM file('matrix.npy')
INTO OUTFILE 'matrix.csv' FORMAT CSVWithNames"
"c0","c1","c2"
0,1,2
3,4,5
6,7,8
9,10,11
12,13,14
Now every dimension of the array is its own column. The CSVWithNames format writes a header row; use plain CSV if you don't want one.
These are the conversion details that an upload-required converter site can't give you, because the conversion runs on your machine and you control the SQL:
- No upload. The array stays local. For experiment outputs, embeddings, or model weights that's the difference between a usable workflow and a non-starter.
- Pick the column name with
SELECT array AS value, which renames the single 1D column from the default array to anything you like.
- Filter and transform on the way out. It's a full
SELECT, so WHERE array > 0, round(array, 2), or ORDER BY array all apply during the conversion, in one pass.
- Headers are optional:
CSVWithNames for a header row, CSV for raw values, TSV if you'd rather have tabs.
- Larger than RAM. The file streams through, so converting a multi-gigabyte
.npy doesn't require multi-gigabyte memory.
Reverse direction, or just exploring? You don't have to convert at all. Query an NPY file directly with SQL. If you want a typed, compressed result instead of text, convert NPY to Parquet.
On a 3,000,000-element float64 array (signal_large.npy, ~24 MB), converting the whole file to CSV completes in:
clickhouse local -q "SELECT array AS value FROM file('signal_large.npy') INTO OUTFILE 'signal_large.csv' FORMAT CSVWithNames"
~0.18 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The result is a 3,000,000-row CSV. The number may wobble slightly under concurrent load, but reading a self-describing binary array and writing text is about as cheap as conversions get.
chDB is the same ClickHouse engine embedded in Python, so the conversion is the identical SQL with no server:
import chdb
# Write a CSV file from the .npy:
chdb.query("SELECT * FROM file('signal.npy') INTO OUTFILE 'signal.csv' FORMAT CSVWithNames")
# Or get the CSV back as a string, no file:
print(chdb.query("SELECT * FROM file('signal.npy') LIMIT 3 FORMAT CSV", "CSV"))
Install it with pip install chdb. This is handy when the .npy is already part of a Python pipeline alongside NumPy and pandas.
The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample .npy files (including the 3,000,000-element file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.
github.com/ClickHouse/examples/tree/main/local-analytics/convert-npy-to-csv
The same SQL that converts one file on your laptop runs unchanged against a ClickHouse server or ClickHouse Cloud when the data outgrows it. To learn the format itself, see what is an NPY file; for the wider workflow, run SQL on a CSV file.