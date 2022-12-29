Schema inference cache

Contains information about all cached file schemas.

Columns:

storage (String) — Storage name: File, URL, S3 or HDFS.

source (String) — File source.

format (String) — Format name.

additional_format_info (String) - Additional information required to identify the schema. For example, format specific settings.

registration_time (DateTime) — Timestamp when schema was added in cache.

schema (String) - Cached schema.

Example

Let's say we have a file data.jsonl with this content:

{ "id" : 1 , "age" : 25 , "name" : "Josh" , "hobbies" : [ "football" , "cooking" , "music" ] }

{ "id" : 2 , "age" : 19 , "name" : "Alan" , "hobbies" : [ "tennis" , "art" ] }

{ "id" : 3 , "age" : 32 , "name" : "Lana" , "hobbies" : [ "fitness" , "reading" , "shopping" ] }

{ "id" : 4 , "age" : 47 , "name" : "Brayan" , "hobbies" : [ "movies" , "skydiving" ] }



tip Place data.jsonl in the user_files_path directory. You can find this by looking in your ClickHouse configuration files. The default is: <user_files_path>/var/lib/clickhouse/user_files/</user_files_path>



Open clickhouse-client and run the DESCRIBE query:

DESCRIBE file ( 'data.jsonl' ) SETTINGS input_format_try_infer_integers = 0 ;



┌─name────┬─type────────────────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐

│ id │ Nullable(Float64) │ │ │ │ │ │

│ age │ Nullable(Float64) │ │ │ │ │ │

│ name │ Nullable(String) │ │ │ │ │ │

│ hobbies │ Array(Nullable(String)) │ │ │ │ │ │

└─────────┴─────────────────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘



Let's see the content of the system.schema_inference_cache table:

SELECT *

FROM system . schema_inference_cache

FORMAT Vertical



Row 1:

──────

storage: File

source: /home/droscigno/user_files/data.jsonl

format: JSONEachRow

additional_format_info: schema_inference_hints=, max_rows_to_read_for_schema_inference=25000, schema_inference_make_columns_nullable=true, try_infer_integers=false, try_infer_dates=true, try_infer_datetimes=true, try_infer_numbers_from_strings=true, read_bools_as_numbers=true, try_infer_objects=false

registration_time: 2022-12-29 17:49:52

schema: id Nullable(Float64), age Nullable(Float64), name Nullable(String), hobbies Array(Nullable(String))



