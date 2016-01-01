RegExp Tree Dictionary
Regexp Tree dictionary stores multiple trees of regular expressions with attributions. Users can retrieve strings in the dictionary. If a string matches the root of the regexp tree, we will collect the corresponding attributes of the matched root and continue to walk the children. If any of the children matches the string, we will collect attributes and rewrite the old ones if conflicts occur, then continue the traverse until we reach leaf nodes.
Example of the ddl query for creating Regexp Tree dictionary:
tip
If you are using a dictionary with ClickHouse Cloud please use the DDL query option to create your dictionaries, and create your dictionary as user
default.
Also, verify the list of supported dictionary sources in the Cloud Compatibility guide.
create dictionary regexp_dict
(
regexp String,
name String,
version String
)
PRIMARY KEY(regexp)
SOURCE(YAMLRegExpTree(PATH '/var/lib/clickhouse/user_files/regexp_tree.yaml'))
LAYOUT(regexp_tree)
...
We only allow
YAMLRegExpTree to work with regexp_tree dicitionary layout. If you want to use other sources, please set variable
regexp_dict_allow_other_sources true.
Source
We introduce a type of source called
YAMLRegExpTree representing the structure of Regexp Tree dictionary. An Example of a valid yaml config is like:
- regexp: 'Linux/(\d+[\.\d]*).+tlinux'
name: 'TencentOS'
version: '\1'
- regexp: '\d+/tclwebkit(?:\d+[\.\d]*)'
name: 'Andriod'
versions:
- regexp: '33/tclwebkit'
version: '13'
- regexp: '3[12]/tclwebkit'
version: '12'
- regexp: '30/tclwebkit'
version: '11'
- regexp: '29/tclwebkit'
version: '10'
The key
regexp represents the regular expression of a tree node. The name of key is same as the dictionary key. The
name and
version is user-defined attributions in the dicitionary. The
versions (which can be any name that not appear in attributions or the key) indicates the children nodes of this tree.
Back Reference
The value of an attribution could contain a back reference which refers to a capture group of the matched regular expression. Reference number ranges from 1 to 9 and writes as
$1 or
\1.
During the query execution, the back reference in the value will be replaced by the matched capture group.
Query
Due to the specialty of Regexp Tree dictionary, we only allow functions
dictGet,
dictGetOrDefault and
dictGetOrNull work with it.
Example:
SELECT dictGet('regexp_dict', ('name', 'version'), '31/tclwebkit1024');
Result:
┌─dictGet('regexp_dict', ('name', 'version'), '31/tclwebkit1024')─┐
│ ('Andriod','12') │
└─────────────────────────────────────────────────────────────────┘