WebAssembly User-Defined Functions
ClickHouse supports creating user-defined functions (UDFs) written in WebAssembly. This allows you to execute custom logic written in languages like Rust, C, C++, or others by compiling them to WebAssembly modules.
Overview
A WebAssembly module is a compiled binary file that contains one or more functions that can be called from ClickHouse. Think of a module as a library or shared object that you load once and reuse many times.
WebAssembly module containing UDFs can be written in any language that can compile to WebAssembly, such as Rust, C, or C++.
Code compiled to WebAssembly ("guest" code) and executed by ClickHouse ("host") run in a sandboxed environment having access only to a dedicated memory space.
Guest code exports functions that ClickHouse can invoke - these include the functions that implement your custom logic (used to define UDFs) as well as support functions required for memory management and data exchange between ClickHouse and the WebAssembly code.
Your code should be compiled to "freestanding" WebAssembly (aka
wasm32-unknown-unknown) without any dependencies on an operating system or standard library. Also only default 32-bit WebAssembly target is supported (no
wasm64 extension).
The module must follow one of the supported communication protocols (ABIs) for interacting with ClickHouse.
Once compiled, the module's binary code is loaded into ClickHouse by inserting it into the
system.webassembly_modules table.
After that, you can create UDFs that reference functions exported by the module using the
CREATE FUNCTION ... LANGUAGE WASM statement.
Prerequisites
Enable WebAssembly support in your ClickHouse configuration:
Available Engine Implementations:
Quick Start
This example demonstrates the complete workflow of creating a WebAssembly UDF by implementing the Collatz conjecture calculator.
We'll write the code in WebAssembly Text format (WAT), which is a human-readable representation of WebAssembly, so no any programming language is required at this stage.
ClickHouse requires the module to be in binary format, so we'll use the transpiler to convert WAT to WASM.
To perform this conversion you may use
wat2wasm from the WebAssembly Binary Toolkit (WABT) or
parse command from the wasm-tools.
In snippet above we pipe binary WASM code directly into ClickHouse client using
FORMAT RawBlob to insert it into
system.webassembly_modules table.
Then we define the UDF that references the
steps function exported by the module:
Note that we specify function name from the module after
::, because it differs from the UDF name.
Now we can use the
collatz_steps function in our queries:
The
number column is explicitly cast to
UInt32, because WebAssembly functions expect exact type matches specified signature in
CREATE FUNCTION statement.
In the result we got sequence of Collatz steps for numbers from 1 to 100, corresponding to sequence A006577 from the OEIS.
Manage WASM modules via system table
WebAssembly modules are stored in the
system.webassembly_modules table having the following structure:
- Columns
nameString — Module name. Non-empty, word characters only.
codeString — Raw binary WASM code. Write-only, reads return empty string.
hashUInt256 — SHA256 of the module binary (zero if present on disk but not yet loaded).
-
Module management happens through standard SQL operations on this table:
Insert a module
Optionally, provide integrity hash:
If the provided hash does not match the computed SHA256 of the module code, the insertion fails. It may be useful when loading modules from external sources such as S3 or HTTP.
List modules
Delete a module
Deletion performed by
DELETE FROM system.webassembly_modules WHERE name = '...' statement.
Only deletion of single module by exact name per single statement is supported.
If any existing UDFs reference the module, the deletion fails, so you must drop those UDFs first.
Create a WebAssembly UDF
Syntax:
Parameters:
function_name: Name of the function in ClickHouse. May be different from the exported function name in the module.
FROM 'module_name' :: 'source_function_name': Name of the loaded WASM module and function name in WASM module to use (defaults to function_name)
ARGUMENTS: List of argument names and types (names optional and used for serialization formats that support named fields)
ABI: Application Binary Interface version
ROW_DIRECT: Direct type mapping, row-by-row processing
BUFFERED_V1: Block-based processing with serialization
-
SHA256_HASH: Expected module hash for verification (auto-filled if omitted), can be used to ensure the correct WASM module loaded across different replicas.
SETTINGS: Per-function settings
max_fuelUInt64 — Instruction fuel per instance. Default:
100000.
max_memoryUInt64 — Max memory usage per instance, in bytes. Range: 64 KiB … 4 GiB. Default:
104857600(100 MiB).
serialization_formatString — Serialization format for ABI requires it. Default:
MsgPack.
max_input_block_sizeUInt64 — If specified, limits the maximum input block size in rows for ABI using block-based processing. Default:
0(unlimited).
max_instancesUInt64 — Maximum parallel instances per function in single query. Default:
128.
-
ABIs Versions
To interact with ClickHouse, WebAssembly modules must adhere to one of the supported ABIs (Application Binary Interfaces).
ROW_DIRECT: Direct type mapping (primitive types
Int32,
UInt32,
Int64,
UInt64,
Float32,
Float64only)
BUFFERED_V1: Complex types with serialization
ABI ROW_DIRECT
Calls an exported WASM function directly per row.
- Arguments and return types as numeric types
Int32/UInt32/Int64/UInt64/Float32/Float64/Int128/UInt128.
- Strings are not supported in this ABI.
- Signatures must match the WASM export (
i32/i64/f32/f64/v128).
- No support functions required to be exported by the module.
For example function with signature:
Can be created as:
WebAssembly does not distinguish between signed and unsigned arguments, but rather uses different instructions to interpret the values. Thus, size of the argument should match exactly, while signedness is determined by the operations inside the function.
ABI BUFFERED_V1
This ABI is experimental and subject to change in future releases.
Processes entire blocks at once using a (de)serialization through WASM memory. Supports any argument and return types.
Serialized data is copied to wasm memory passed as pointer to buffer (which consists of pointer to data and size of the data) to the UDF function along with the number of rows in the input. Thus, user-defined function on wasm time always accepts two
i32 arguments and returns single
i32 value.
Guest code processes the data and returns a pointer to the result buffer with serialized result data.
The guest code must provide two functions to create and destroy these buffers.
Example C definitions:
Host API available to modules
The following host functions may be imported and used by modules:
clickhouse_server_version() -> i64— returns ClickHouse server version as integer (e.g. 25011001 for v25.11.1.1).
clickhouse_terminate(ptr: i32, size: i32)— throws an error with the provided message. Accepts pointer to the memory location containing the error message string and size of the string.
clickhouse_log(ptr: i32, size: i32)— logs a message to ClickHouse server text log.
clickhouse_random(ptr: i32, size: i32)— fills memory with random bytes.