What is an .npy file?

An .npy file is NumPy's binary format for storing a single numeric array on disk. It is a short text header recording the element type (dtype), the memory layout (C or Fortran order), and the array's shape, followed by the array's raw bytes laid out contiguously. To read one without Python, point clickhouse local at it:

1clickhouse local -q "SELECT * FROM file('readings.npy', Npy)"

That returns the array as a one-column table you can run SQL over.

At a glance

Property	.npy
Holds	Exactly one numeric array (not a table)
Layout	Header (text) + raw contiguous buffer
Schema	`dtype` + `shape` stored in the header
Compression	None (use `.npz`, a zip of arrays, for that)
Endianness	Encoded in the dtype string (`<` little, `>` big)
Typical use	Saving/loading arrays for NumPy, ML pipelines, scientific code

Internal structure

An .npy file is deliberately minimal. From the start of the file:

Magic string: the six bytes \x93NUMPY. This identifies the file as an .npy.
Version: two bytes, major and minor (e.g. 1.0, 2.0, 3.0). Later versions allow a longer header.
Header length: a little-endian integer giving the size of the header that follows.
Header dictionary: an ASCII string holding a Python dict literal with three keys: descr (the dtype, like '<f8' for little-endian Float64), fortran_order (True/False), and shape (a tuple). It is padded with spaces so the raw data starts on an aligned boundary.
Raw data: every array element, back to back, with no per-value framing. The header alone tells a reader how to slice it.

That self-describing header is the whole trick. Let's write a real file and read the header straight back out.

clickhouse local can do both. It's part of ClickHouse; install it with clickhousectl, the ClickHouse CLI for local and cloud:

1curl https://clickhouse.com/cli | sh   # install clickhousectl
2clickhousectl local use latest         # download ClickHouse and put it on your PATH

Write a 20-element array of Float64 readings to readings.npy:

1clickhouse local -q "
2SELECT round(18 + 8 * sin(number / 3.0), 2) AS array
3FROM numbers(20)
4INTO OUTFILE 'readings.npy'
5FORMAT Npy"

Now read the first bytes of the file as a string. The magic string, the version, and the header dictionary are all sitting there in plain sight:

1clickhouse local -q "
2SELECT
3    substring(file('readings.npy'), 1, 6)               AS magic,
4    reinterpretAsUInt8(substring(file('readings.npy'), 7, 1)) AS version_major,
5    reinterpretAsUInt8(substring(file('readings.npy'), 8, 1)) AS version_minor,
6    trim(substring(file('readings.npy'), 11, 60))        AS header_dict
7FROM numbers(1)
8FORMAT Vertical"

1Row 1:
2──────
3magic:         �NUMPY
4version_major: 3
5version_minor: 0
6header_dict:     {'descr':'<f8','fortran_order':False,'shape':(20,),}

The first byte (0x93, shown as a replacement character above) plus NUMPY is the magic string. The header dict reports '<f8', a little-endian 8-byte float (Float64), in C order, with shape (20,). That single line is everything a reader needs to interpret the bytes that follow.

How the dtype maps to a column type

Because the dtype is in the header, clickhouse local infers the column type from it. No structure argument is needed:

1clickhouse local -q "DESCRIBE file('readings.npy', Npy)"

1array	Float64

An .npy holds one array, so ClickHouse exposes it as a single column named array. The '<f8' dtype becomes Float64; an integer array written as '<i4' would come back as Int32. Read the values out and they are just rows:

1clickhouse local -q "
2SELECT array AS reading
3FROM file('readings.npy', Npy)
4LIMIT 8
5FORMAT Pretty"

1┏━━━━━━━━━┓
2   ┃ reading ┃
3   ┡━━━━━━━━━┩
41. │      18 │
52. │   20.62 │
63. │   22.95 │
74. │   24.73 │
85. │   25.78 │
96. │   25.96 │
107. │   25.27 │
118. │   23.78 │
12   └─────────┘

From here it's ordinary SQL:

1clickhouse local -q "
2SELECT count() AS n,
3       min(array)        AS min_reading,
4       max(array)        AS max_reading,
5       round(avg(array), 3) AS avg_reading
6FROM file('readings.npy', Npy)
7FORMAT Vertical"

1Row 1:
2──────
3n:           20
4min_reading: 10.01
5max_reading: 25.96
6avg_reading: 18.011

What an .npy is not

An .npy is a single array, not a table. It has no column names, carries one element type, and cannot nest. If you need several arrays in one file, NumPy's .npz format is a zip archive of .npy entries. For typed, columnar, compressed tabular data with embedded statistics, reach for Parquet instead. The trade is that .npy gives all of that up for a format so simple you can parse the header by eye, and the payoff is speed: there is no decoding to do, so a scan is essentially a memcpy.

Reading the larger array shipped with the example (3,000,000 Float64 values, a 23 MB file) and aggregating it runs in 0.032s (best of three, warm cache, on an Apple M4 Pro, 14 cores, 24 GB RAM; figure may shift slightly under concurrent load):

1Row 1:
2──────
3n:           3000000
4avg_reading: 18.0053

Read one yourself

You don't need NumPy or a server to inspect an .npy file. clickhouse local reads it in place from the command line, and so does NumPy itself; pick whichever single tool is already in front of you. ClickHouse uses the same SQL whether you point it at a file on your laptop, a server, or ClickHouse Cloud, so the query you write against a local .npy scales up unchanged.

For a step-by-step walkthrough, including writing arrays back out, see how to read an .npy file. Working in Python? See read an .npy file in Python. Need it as a table for other tools? Convert .npy to CSV or convert .npy to Parquet.

Run it yourself: the data generator and every command above live in local-analytics/what-is-an-npy-file in the ClickHouse examples repo.

At a glance

Internal structure

How the dtype maps to a column type

What an .npy is not

Read one yourself

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

What is an .npy file?

At a glance #

Internal structure #

How the dtype maps to a column type #

What an .npy is not #

Read one yourself #

Subscribe to our newsletter

More like this

How to engineer cost-efficient open source observability with ClickHouse (ClickStack) - 2026 technical playbook

Build a dashboard in Python with ClickHouse and Streamlit

Instrumenting OpenAI with OpenTelemetry (OTel)

Real-time data visualization

At a glance

Internal structure

How the dtype maps to a column type

What an .npy is not

Read one yourself