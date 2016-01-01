Function-Level Configuration
DataStore allows fine-grained control over execution at the function level, including engine selection and Dtype correction.
Function Engine Configuration
Override the execution engine for specific functions.
Setting Function Engines
When to Use
Force chdb for:
- Functions with better ClickHouse performance
- Functions that benefit from SQL optimization
- Large-scale string/datetime operations
Force pandas for:
- Functions with pandas-specific behavior
- When exact pandas compatibility is required
- Custom string operations
Example
Overlapping Functions
159+ functions are available in both chdb and pandas engines:
|Category
|Functions
|String
length,
upper,
lower,
trim,
ltrim,
rtrim,
concat,
substring,
replace,
reverse,
contains,
startswith,
endswith
|Math
abs,
round,
floor,
ceil,
exp,
log,
log10,
sqrt,
pow,
sin,
cos,
tan
|DateTime
year,
month,
day,
hour,
minute,
second,
dayofweek,
dayofyear,
quarter
|Aggregation
sum,
avg,
min,
max,
count,
std,
var,
median
For overlapping functions, the engine is selected based on:
- Explicit function configuration (if set)
- Global execution_engine setting
- Auto-selection based on context
chdb-Only Functions
Some functions are only available through ClickHouse:
|Category
|Functions
|Array
arraySum,
arrayAvg,
arraySort,
arrayDistinct,
groupArray,
arrayElement
|JSON
JSONExtractString,
JSONExtractInt,
JSONExtractFloat,
JSONHas
|URL
domain,
path,
protocol,
extractURLParameter
|IP
IPv4StringToNum,
IPv4NumToString,
isIPv4String
|Geo
greatCircleDistance,
geoDistance,
geoToH3
|Hash
cityHash64,
xxHash64,
sipHash64,
MD5,
SHA256
|Conditional
sumIf,
countIf,
avgIf,
minIf,
maxIf
These functions automatically use chdb engine regardless of configuration.
pandas-Only Functions
Some functions are only available through pandas:
|Category
|Functions
|Apply
|Custom lambda functions, user-defined functions
|Complex Pivot
|Pivot tables with custom aggregations
|Stack/Unstack
|Complex reshaping operations
|Interpolate
|Time series interpolation methods
These functions automatically use pandas engine regardless of configuration.
Dtype Correction
Configure how DataStore corrects data types between engines.
Correction Levels
Correction Level Details
|Level
|Description
|Types Corrected
NONE
|No automatic correction
|None
CRITICAL
|Essential corrections
|NULL handling, boolean conversion
HIGH (default)
|Common corrections
|Integer/float precision, datetime, string encoding
MEDIUM
|More corrections
|Decimal precision, timezone handling
ALL
|Maximum correction
|All type differences
When Types Need Correction
Type differences can occur when:
- ClickHouse → pandas: Different integer sizes (Int64 vs int64)
- pandas → ClickHouse: Python objects to SQL types
- NULL handling: pandas NA vs ClickHouse NULL
- Boolean: Different boolean representations
- DateTime: Timezone differences
Example
Function Configuration API
function_config Object
Per-Call Override
Some methods support per-call engine override: