Log
The engine belongs to the family of Log
engines. See the common properties of Log
engines and their differences in the Log Engine Family article.
Log
differs from TinyLog in that a small file of "marks" resides with the column files. These marks are written on every data block and contain offsets that indicate where to start reading the file in order to skip the specified number of rows. This makes it possible to read table data in multiple threads.
For concurrent data access, the read operations can be performed simultaneously, while write operations block reads and each other.
The Log
engine does not support indexes. Similarly, if writing to a table failed, the table is broken, and reading from it returns an error. The Log
engine is appropriate for temporary data, write-once tables, and for testing or demonstration purposes.
Creating a Table
CREATE TABLE [IF NOT EXISTS] [db.]table_name [ON CLUSTER cluster]
(
column1_name [type1] [DEFAULT|MATERIALIZED|ALIAS expr1],
column2_name [type2] [DEFAULT|MATERIALIZED|ALIAS expr2],
...
) ENGINE = Log
See the detailed description of the CREATE TABLE query.
Writing the Data
The Log
engine efficiently stores data by writing each column to its own file. For every table, the Log engine writes the following files to the specified storage path:
<column>.bin
: A data file for each column, containing the serialized and compressed data.__marks.mrk
: A marks file, storing offsets and row counts for each data block inserted. Marks are used to facilitate efficient query execution by allowing the engine to skip irrelevant data blocks during reads.
Writing Process
When data is written to a Log
table:
- Data is serialized and compressed into blocks.
- For each column, the compressed data is appended to its respective
<column>.bin
file. - Corresponding entries are added to the
__marks.mrk
file to record the offset and row count of the newly inserted data.
Reading the Data
The file with marks allows ClickHouse to parallelize the reading of data. This means that a SELECT
query returns rows in an unpredictable order. Use the ORDER BY
clause to sort rows.
Example of Use
Creating a table:
CREATE TABLE log_table
(
timestamp DateTime,
message_type String,
message String
)
ENGINE = Log
Inserting data:
INSERT INTO log_table VALUES (now(),'REGULAR','The first regular message')
INSERT INTO log_table VALUES (now(),'REGULAR','The second regular message'),(now(),'WARNING','The first warning message')
We used two INSERT
queries to create two data blocks inside the <column>.bin
files.
ClickHouse uses multiple threads when selecting data. Each thread reads a separate data block and returns resulting rows independently as it finishes. As a result, the order of blocks of rows in the output may not match the order of the same blocks in the input. For example:
SELECT * FROM log_table
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
└─────────────────────┴──────────────┴────────────────────────────┘
┌───────────timestamp─┬─message_type─┬─message───────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
└─────────────────────┴──────────────┴───────────────────────────┘
Sorting the results (ascending order by default):
SELECT * FROM log_table ORDER BY timestamp
┌───────────timestamp─┬─message_type─┬─message────────────────────┐
│ 2019-01-18 14:23:43 │ REGULAR │ The first regular message │
│ 2019-01-18 14:27:32 │ REGULAR │ The second regular message │
│ 2019-01-18 14:34:53 │ WARNING │ The first warning message │
└─────────────────────┴──────────────┴────────────────────────────┘