# varSamp

This page contains information on the varSamp and varSampStable ClickHouse functions.

## varSamp​

Calculate the sample variance of a data set.

Syntax

varSamp(expr)

Parameters

• expr: An expression representing the data set for which you want to calculate the sample variance. Expression

Returned value

Returns a Float64 value representing the sample variance of the input data set.

Implementation details

The varSamp() function calculates the sample variance using the following formula:

∑(x - mean(x))^2 / (n - 1)

Where:

• x is each individual data point in the data set.
• mean(x) is the arithmetic mean of the data set.
• n is the number of data points in the data set.

The function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the varPop() function instead.

This function uses a numerically unstable algorithm. If you need numerical stability in calculations, use the slower but more stable varSampStable function.

Example

Query:

CREATE TABLE example_table(    id UInt64,    value Float64)ENGINE = MergeTreeORDER BY id;INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);SELECT varSamp(value) FROM example_table;

Response:

0.8650000000000091

## varSampStable​

Calculate the sample variance of a data set using a numerically stable algorithm.

Syntax

varSampStable(expr)

Parameters

• expr: An expression representing the data set for which you want to calculate the sample variance. Expression

Returned value

The varSampStable function returns a Float64 value representing the sample variance of the input data set.

Implementation details

The varSampStable function calculates the sample variance using the same formula as the varSamp function:

∑(x - mean(x))^2 / (n - 1)

Where:

• x is each individual data point in the data set.
• mean(x) is the arithmetic mean of the data set.
• n is the number of data points in the data set.

The difference between varSampStable and varSamp is that varSampStable is designed to provide a more deterministic and stable result when dealing with floating-point arithmetic. It uses an algorithm that minimizes the accumulation of rounding errors, which can be particularly important when dealing with large data sets or data with a wide range of values.

Like varSamp, the varSampStable function assumes that the input data set represents a sample from a larger population. If you want to calculate the variance of the entire population (when you have the complete data set), you should use the varPopStable function instead.

Example

Query:

CREATE TABLE example_table(    id UInt64,    value Float64)ENGINE = MergeTreeORDER BY id;INSERT INTO example_table VALUES (1, 10.5), (2, 12.3), (3, 9.8), (4, 11.2), (5, 10.7);SELECT varSampStable(value) FROM example_table;

Response:

0.865

This query calculates the sample variance of the value column in the example_table using the varSampStable() function. The result shows that the sample variance of the values [10.5, 12.3, 9.8, 11.2, 10.7] is approximately 0.865, which may differ slightly from the result of varSamp due to the more precise handling of floating-point arithmetic.