Skip to main content
Skip to main content
Edit this page

Executable Pool dictionary source

Executable pool allows loading data from a pool of processes. This source does not work with dictionary layouts that need to load all data from source.

Executable pool works if the dictionary is stored using one of the following layouts:

  • cache
  • complex_key_cache
  • ssd_cache
  • complex_key_ssd_cache
  • direct
  • complex_key_direct

Executable pool will spawn a pool of processes with the specified command and keep them running until they exit. The program should read data from STDIN while it is available and output the result to STDOUT. It can wait for the next block of data on STDIN. ClickHouse will not close STDIN after processing a block of data, but will pipe another chunk of data when needed. The executable script should be ready for this way of data processing — it should poll STDIN and flush data to STDOUT early.

Example of settings:

SOURCE(EXECUTABLE_POOL(
    command 'while read key; do printf "$key\tData for key $key\n"; done'
    format 'TabSeparated'
    pool_size 10
    max_command_execution_time 10
    implicit_key false
))

Setting fields:

SettingDescription
commandThe absolute path to the executable file, or the file name (if the program directory is written to PATH).
formatThe file format. All the formats described in Formats are supported.
pool_sizeSize of pool. If 0 is specified as pool_size then there is no pool size restrictions. Default value is 16.
command_termination_timeoutExecutable script should contain main read-write loop. After dictionary is destroyed, pipe is closed, and executable file will have command_termination_timeout seconds to shutdown before ClickHouse will send SIGTERM signal to child process. Specified in seconds. Default value is 10. Optional.
max_command_execution_timeMaximum executable script command execution time for processing block of data. Specified in seconds. Default value is 10. Optional.
command_read_timeoutTimeout for reading data from command stdout in milliseconds. Default value 10000. Optional.
command_write_timeoutTimeout for writing data to command stdin in milliseconds. Default value 10000. Optional.
implicit_keyThe executable source file can return only values, and the correspondence to the requested keys is determined implicitly by the order of rows in the result. Default value is false. Optional.
execute_directIf execute_direct = 1, then command will be searched inside user_scripts folder specified by user_scripts_path. Additional script arguments can be specified using whitespace separator. Example: script_name arg1 arg2. If execute_direct = 0, command is passed as argument for bin/sh -c. Default value is 1. Optional.
send_chunk_headerControls whether to send row count before sending a chunk of data to process. Default value is false. Optional.

That dictionary source can be configured only via XML configuration. Creating dictionaries with executable source via DDL is disabled, otherwise, the DB user would be able to execute arbitrary binary on ClickHouse node.