ClickHouse is better with data! There are multiple ways to add data and most of them are available on the Data Sources page, which can be accessed in the navigation menu.

You can upload data using the following methods:

Setup a ClickPipe to start ingesting data from data sources like S3, Postgres, Kafka, GCS

Use the SQL console

Use the ClickHouse client

Upload a file - accepted formats include JSON, CSV and TSV

Upload data from file URL

ClickPipes is a managed integration platform that makes ingesting data from a diverse set of sources as simple as clicking a few buttons. Designed for the most demanding workloads, ClickPipes's robust and scalable architecture ensures consistent performance and reliability. ClickPipes can be used for long-term streaming needs or one-time data loading job.

Like most database management systems, ClickHouse logically groups tables into databases. Use the CREATE DATABASE command to create a new database in ClickHouse:

Run the following command to create a table named my_first_table in the helloworld database:

In the example above, my_first_table is a MergeTree table with four columns:

user_id : a 32-bit unsigned integer (UInt32)

: a 32-bit unsigned integer (UInt32) message : a String data type, which replaces types like VARCHAR , BLOB , CLOB and others from other database systems

: a String data type, which replaces types like , , and others from other database systems timestamp : a DateTime value, which represents an instant in time

: a DateTime value, which represents an instant in time metric : a 32-bit floating point number (Float32)

Table engines Table engines determine: How and where data is stored

Which queries are supported

Whether or not the data is replicated

There are many table engines to choose from, but for a simple table on a single-node ClickHouse server, MergeTree is your likely choice.

Before you go any further, it is important to understand how primary keys work in ClickHouse (the implementation of primary keys might seem unexpected!):

primary keys in ClickHouse are not unique for each row in a table

The primary key of a ClickHouse table determines how the data is sorted when written to disk. Every 8,192 rows or 10MB of data (referred to as the index granularity) creates an entry in the primary key index file. This granularity concept creates a sparse index that can easily fit in memory, and the granules represent a stripe of the smallest amount of column data that gets processed during SELECT queries.

The primary key can be defined using the PRIMARY KEY parameter. If you define a table without a PRIMARY KEY specified, then the key becomes the tuple specified in the ORDER BY clause. If you specify both a PRIMARY KEY and an ORDER BY , the primary key must be a subset of the sort order.

The primary key is also the sorting key, which is a tuple of (user_id, timestamp) . Therefore, the data stored in each column file will be sorted by user_id , then timestamp .

For a deep dive into core ClickHouse concepts, see "Core Concepts".

You can use the familiar INSERT INTO TABLE command with ClickHouse, but it is important to understand that each insert into a MergeTree table causes a part to be created in storage.

ClickHouse best practice Insert a large number of rows per batch - tens of thousands or even millions of rows at once. Don't worry - ClickHouse can easily handle that type of volume - and it will save you money by sending fewer write requests to your service.

Even for a simple example, let's insert more than one row at a time:

备注 Notice the timestamp column is populated using various Date and DateTime functions. ClickHouse has hundreds of useful functions that you can view in the Functions section.

Let's verify it worked:

You can also connect to your ClickHouse Cloud service using a command-line tool named clickhouse client. Click Connect on the left menu to access these details. From the dialog select Native from the drop-down:

Install ClickHouse. Run the command, substituting your hostname, username, and password:

If you get the smiley face prompt, you are ready to run queries!

Give it a try by running the following query:

Notice the response comes back in a nice table format:

Add a FORMAT clause to specify one of the many supported output formats of ClickHouse:

In the above query, the output is returned as tab-separated:

To exit the clickhouse client , enter the exit command:

A common task when getting started with a database is to insert some data that you already have in files. We have some sample data online that you can insert that represents clickstream data - it includes a user ID, a URL that was visited, and the timestamp of the event.

Suppose we have the following text in a CSV file named data.csv :

The following command inserts the data into my_first_table :