Skip to main content



We assume the use of the dbt CLI for the following examples. Users may also wish to consider dbt Cloud, which offers a web-based Integrated Development Environment (IDE) allowing users to edit and run projects.

dbt offers a number of options for CLI installation. Follow the instructions described here. At this stage install dbt-core only. We recommend the use of pip.

python install dbt-core

Important: The following is tested under python 3.9.

ClickHouse plugin

Install the dbt ClickHouse plugin:

pip install dbt-clickhouse

Prepare ClickHouse

dbt excels when modeling highly relational data. For the purposes of example, we provide a small IMDB dataset with the following relational schema. This dataset originates from the relational dataset repository. This is trivial relative to common schemas used with dbt but represents a manageable sample:

IMDB table schema

We use a subset of these tables as shown.

Create the following tables:


CREATE TABLE imdb.actors
id UInt32,
first_name String,
last_name String,
gender FixedString(1)
) ENGINE = MergeTree ORDER BY (id, first_name, last_name, gender);

CREATE TABLE imdb.directors
id UInt32,
first_name String,
last_name String
) ENGINE = MergeTree ORDER BY (id, first_name, last_name);

CREATE TABLE imdb.genres
movie_id UInt32,
genre String
) ENGINE = MergeTree ORDER BY (movie_id, genre);

CREATE TABLE imdb.movie_directors
director_id UInt32,
movie_id UInt64
) ENGINE = MergeTree ORDER BY (director_id, movie_id);

CREATE TABLE imdb.movies
id UInt32,
name String,
year UInt32,
rank Float32 DEFAULT 0
) ENGINE = MergeTree ORDER BY (id, name, year);

CREATE TABLE imdb.roles
created_at DateTime DEFAULT now(),
actor_id UInt32,
movie_id UInt32,
role String
) ENGINE = MergeTree ORDER BY (actor_id, movie_id);

The column created_at for the table roles, which defaults to a value of now(). We use this later to identify incremental updates to our models - see Incremental Models.

We use the s3 function to read the source data from public endpoints to insert data. Run the following commands to populate the tables:

INSERT INTO imdb.actors
FROM s3('',

INSERT INTO imdb.directors
FROM s3('',

INSERT INTO imdb.genres
FROM s3('',

INSERT INTO imdb.movie_directors
FROM s3('',

INSERT INTO imdb.movies
FROM s3('',

INSERT INTO imdb.roles
FROM s3('',

The execution of these may vary depending on your bandwidth, but each should only take a few seconds to complete. Execute the following query to compute a summary of each actor, ordered by the most movie appearances, and to confirm the data was loaded successfully:

any(actor_name) as name,
uniqExact(movie_id) as num_movies,
avg(rank) as avg_rank,
uniqExact(genre) as unique_genres,
uniqExact(director_name) as uniq_directors,
max(created_at) as updated_at
SELECT as id,
concat(imdb.actors.first_name, ' ', imdb.actors.last_name) as actor_name, as movie_id,
Imdb.movies.rank as rank,
concat(imdb.directors.first_name, ' ', imdb.directors.last_name) as director_name,
FROM imdb.actors
JOIN imdb.roles ON imdb.roles.actor_id =
LEFT OUTER JOIN imdb.movies ON = imdb.roles.movie_id
LEFT OUTER JOIN imdb.genres ON imdb.genres.movie_id =
LEFT OUTER JOIN imdb.movie_directors ON imdb.movie_directors.movie_id =
LEFT OUTER JOIN imdb.directors ON = imdb.movie_directors.director_id
ORDER BY num_movies DESC

The response should look like:

|id |name |num_movies|avg_rank |unique_genres|uniq_directors|updated_at |
|45332 |Mel Blanc |832 |6.175853582979779 |18 |84 |2022-04-26 14:01:45|
|621468|Bess Flowers|659 |5.57727638854796 |19 |293 |2022-04-26 14:01:46|
|372839|Lee Phelps |527 |5.032976449684617 |18 |261 |2022-04-26 14:01:46|
|283127|Tom London |525 |2.8721716524875673|17 |203 |2022-04-26 14:01:46|
|356804|Bud Osborne |515 |2.0389507108727773|15 |149 |2022-04-26 14:01:46|

In the later guides, we will convert this query into a model - materializing it in ClickHouse as a dbt view and table.