What is an .npy file?

Al Brown
Last updated: Jun 15, 2026

An .npy file is NumPy's binary format for storing a single numeric array on disk. It is a short text header recording the element type (dtype), the memory layout (C or Fortran order), and the array's shape, followed by the array's raw bytes laid out contiguously. To read one without Python, point clickhouse local at it:

1clickhouse local -q "SELECT * FROM file('readings.npy', Npy)"

That returns the array as a one-column table you can run SQL over.

At a glance #

Property.npy
HoldsExactly one numeric array (not a table)
LayoutHeader (text) + raw contiguous buffer
Schemadtype + shape stored in the header
CompressionNone (use .npz, a zip of arrays, for that)
EndiannessEncoded in the dtype string (< little, > big)
Typical useSaving/loading arrays for NumPy, ML pipelines, scientific code

Internal structure #

An .npy file is deliberately minimal. From the start of the file:

  • Magic string: the six bytes \x93NUMPY. This identifies the file as an .npy.
  • Version: two bytes, major and minor (e.g. 1.0, 2.0, 3.0). Later versions allow a longer header.
  • Header length: a little-endian integer giving the size of the header that follows.
  • Header dictionary: an ASCII string holding a Python dict literal with three keys: descr (the dtype, like '<f8' for little-endian Float64), fortran_order (True/False), and shape (a tuple). It is padded with spaces so the raw data starts on an aligned boundary.
  • Raw data: every array element, back to back, with no per-value framing. The header alone tells a reader how to slice it.

That self-describing header is the whole trick. Let's write a real file and read the header straight back out.

clickhouse local can do both. It's part of ClickHouse; install it with clickhousectl, the ClickHouse CLI for local and cloud:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Write a 20-element array of Float64 readings to readings.npy:

1clickhouse local -q "
2SELECT round(18 + 8 * sin(number / 3.0), 2) AS array
3FROM numbers(20)
4INTO OUTFILE 'readings.npy'
5FORMAT Npy"

Now read the first bytes of the file as a string. The magic string, the version, and the header dictionary are all sitting there in plain sight:

1clickhouse local -q "
2SELECT
3    substring(file('readings.npy'), 1, 6)               AS magic,
4    reinterpretAsUInt8(substring(file('readings.npy'), 7, 1)) AS version_major,
5    reinterpretAsUInt8(substring(file('readings.npy'), 8, 1)) AS version_minor,
6    trim(substring(file('readings.npy'), 11, 60))        AS header_dict
7FROM numbers(1)
8FORMAT Vertical"
1Row 1:
2──────
3magic:         �NUMPY
4version_major: 3
5version_minor: 0
6header_dict:     {'descr':'<f8','fortran_order':False,'shape':(20,),}

The first byte (0x93, shown as a replacement character above) plus NUMPY is the magic string. The header dict reports '<f8', a little-endian 8-byte float (Float64), in C order, with shape (20,). That single line is everything a reader needs to interpret the bytes that follow.

How the dtype maps to a column type #

Because the dtype is in the header, clickhouse local infers the column type from it. No structure argument is needed:

1clickhouse local -q "DESCRIBE file('readings.npy', Npy)"
1array	Float64

An .npy holds one array, so ClickHouse exposes it as a single column named array. The '<f8' dtype becomes Float64; an integer array written as '<i4' would come back as Int32. Read the values out and they are just rows:

1clickhouse local -q "
2SELECT array AS reading
3FROM file('readings.npy', Npy)
4LIMIT 8
5FORMAT Pretty"
1   ┏━━━━━━━━━┓
2   ┃ reading ┃
3   ┡━━━━━━━━━┩
41. │      18 │
52. │   20.62 │
63. │   22.95 │
74. │   24.73 │
85. │   25.78 │
96. │   25.96 │
107. │   25.27 │
118. │   23.78 │
12   └─────────┘

From here it's ordinary SQL:

1clickhouse local -q "
2SELECT count() AS n,
3       min(array)        AS min_reading,
4       max(array)        AS max_reading,
5       round(avg(array), 3) AS avg_reading
6FROM file('readings.npy', Npy)
7FORMAT Vertical"
1Row 1:
2──────
3n:           20
4min_reading: 10.01
5max_reading: 25.96
6avg_reading: 18.011

What an .npy is not #

An .npy is a single array, not a table. It has no column names, carries one element type, and cannot nest. If you need several arrays in one file, NumPy's .npz format is a zip archive of .npy entries. For typed, columnar, compressed tabular data with embedded statistics, reach for Parquet instead. The trade is that .npy gives all of that up for a format so simple you can parse the header by eye, and the payoff is speed: there is no decoding to do, so a scan is essentially a memcpy.

Reading the larger array shipped with the example (3,000,000 Float64 values, a 23 MB file) and aggregating it runs in 0.032s (best of three, warm cache, on an Apple M4 Pro, 14 cores, 24 GB RAM; figure may shift slightly under concurrent load):

1Row 1:
2──────
3n:           3000000
4avg_reading: 18.0053

Read one yourself #

You don't need NumPy or a server to inspect an .npy file. clickhouse local reads it in place from the command line, and so does NumPy itself; pick whichever single tool is already in front of you. ClickHouse uses the same SQL whether you point it at a file on your laptop, a server, or ClickHouse Cloud, so the query you write against a local .npy scales up unchanged.

For a step-by-step walkthrough, including writing arrays back out, see how to read an .npy file. Working in Python? See read an .npy file in Python. Need it as a table for other tools? Convert .npy to CSV or convert .npy to Parquet.

Run it yourself: the data generator and every command above live in local-analytics/what-is-an-npy-file in the ClickHouse examples repo.

Share this resource

Subscribe to our newsletter

Stay informed on feature releases, product roadmap, support, and cloud offerings!
Loading form...