How to convert NPY to CSV

Al Brown
Last updated: Jun 15, 2026

To convert a .npy file to CSV, use clickhouse local. It runs SQL directly on files from the command line, with no server to install. It's part of ClickHouse, so the same query scales to billions of rows when you outgrow your laptop.

Install it with clickhousectl:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Then point it at the file and write the result out as CSV:

1clickhouse local -q "SELECT * FROM file('signal.npy') INTO OUTFILE 'signal.csv' FORMAT CSVWithNames"
1"array"
20
347.9426
484.1471
599.7495
690.9297
759.8472
814.112

clickhouse local reads the dtype from the .npy header directly, so a numeric array converts straight to CSV with no import step. The file streams through, so conversions larger than RAM complete without loading the whole array at once.

NPY is numeric, and it carries its own dtype #

The .npy format stores one NumPy array per file, and only numbers — no strings, no dates, no nested objects. The good news is the array's dtype is written into the file header, so you never supply a schema. DESCRIBE shows what was read:

1clickhouse local -q "DESCRIBE file('signal.npy')"
1array	Float64

A one-dimensional float64 array becomes a single Float64 column named array, one element per row. An int32 array becomes Int32, and so on. The type carries through to the CSV unchanged, so there is no lossy cast on a plain 1D conversion.

The gotcha: a 2D array is one Array column #

This is where .npy differs from a flat CSV. A two-dimensional array doesn't arrive as several columns. It arrives as a single column whose type is an array:

1clickhouse local -q "DESCRIBE file('matrix.npy')"
2clickhouse local -q "SELECT * FROM file('matrix.npy')"
1array	Array(Int32)
2[0,1,2]
3[3,4,5]
4[6,7,8]
5[9,10,11]
6[12,13,14]

If you convert that as-is, each row of CSV holds one quoted list, which is rarely what you want in a spreadsheet or another tool:

1clickhouse local -q "SELECT * FROM file('matrix.npy') INTO OUTFILE 'matrix_naive.csv' FORMAT CSV"
1"[0,1,2]"
2"[3,4,5]"
3"[6,7,8]"

To get real, separate CSV columns, index into the array (ClickHouse arrays are 1-based) and name each element:

1clickhouse local -q "
2SELECT array[1] AS c0, array[2] AS c1, array[3] AS c2
3FROM file('matrix.npy')
4INTO OUTFILE 'matrix.csv' FORMAT CSVWithNames"
1"c0","c1","c2"
20,1,2
33,4,5
46,7,8
59,10,11
612,13,14

Now every dimension of the array is its own column. The CSVWithNames format writes a header row; use plain CSV if you don't want one.

Options worth knowing #

These are the conversion details that an upload-required converter site can't give you, because the conversion runs on your machine and you control the SQL:

  • No upload. The array stays local. For experiment outputs, embeddings, or model weights that's the difference between a usable workflow and a non-starter.
  • Pick the column name with SELECT array AS value, which renames the single 1D column from the default array to anything you like.
  • Filter and transform on the way out. It's a full SELECT, so WHERE array > 0, round(array, 2), or ORDER BY array all apply during the conversion, in one pass.
  • Headers are optional: CSVWithNames for a header row, CSV for raw values, TSV if you'd rather have tabs.
  • Larger than RAM. The file streams through, so converting a multi-gigabyte .npy doesn't require multi-gigabyte memory.

Reverse direction, or just exploring? You don't have to convert at all. Query an NPY file directly with SQL. If you want a typed, compressed result instead of text, convert NPY to Parquet.

How fast is it? #

On a 3,000,000-element float64 array (signal_large.npy, ~24 MB), converting the whole file to CSV completes in:

1clickhouse local -q "SELECT array AS value FROM file('signal_large.npy') INTO OUTFILE 'signal_large.csv' FORMAT CSVWithNames"

~0.18 seconds, best of three with a warm OS page cache, on an Apple M4 Pro laptop (14 cores, 24 GB RAM). The result is a 3,000,000-row CSV. The number may wobble slightly under concurrent load, but reading a self-describing binary array and writing text is about as cheap as conversions get.

The same in Python (chDB) #

chDB is the same ClickHouse engine embedded in Python, so the conversion is the identical SQL with no server:

1import chdb
2
3# Write a CSV file from the .npy:
4chdb.query("SELECT * FROM file('signal.npy') INTO OUTFILE 'signal.csv' FORMAT CSVWithNames")
5
6# Or get the CSV back as a string, no file:
7print(chdb.query("SELECT * FROM file('signal.npy') LIMIT 3 FORMAT CSV", "CSV"))
10
247.9426
384.1471

Install it with pip install chdb. This is handy when the .npy is already part of a Python pipeline alongside NumPy and pandas.

Run it yourself #

The complete, runnable example lives in the ClickHouse examples repo: generate.sh to create the sample .npy files (including the 3,000,000-element file used for the timing above), run.sh with every command on this page, run.py / run.ipynb for the chDB version, and expected_output.txt.

github.com/ClickHouse/examples/tree/main/local-analytics/convert-npy-to-csv

The same SQL that converts one file on your laptop runs unchanged against a ClickHouse server or ClickHouse Cloud when the data outgrows it. To learn the format itself, see what is an NPY file; for the wider workflow, run SQL on a CSV file.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...