Skip to main content

Creating a Simple View Materialization

When using the view materialization, a model is rebuilt as a view on each run, via a CREATE VIEW AS statement in ClickHouse. This doesn't require any additional storage of data but will be slower to query than table materializations.

  1. From the imdb folder, delete the directory models/example:

    [email protected]:~/imdb$ rm -rf models/example
  2. Create a new file in the actors within the models folder. Here we create files that each represent an actor model:

    [email protected]:~/imdb$ mkdir models/actors
  3. Create the files schema.yml and actor_summary.sql in the models/actors folder. Add the following contents:

    [email protected]:~/imdb$ touch models/actors/actor_summary.sql
    [email protected]:~/imdb$ touch models/actors/schema.yml

    The schema.yml defines our tables. These will subsequently be available for use in macros.

    version: 2

    sources:
    - name: imdb
    tables:
    - name: directors
    - name: actors
    - name: roles
    - name: movies
    - name: genres
    - name: movie_directors

    The actors_summary.sql defines our actual model. Note in the config function we also request the model be materialized as a view in ClickHouse. Our tables are referenced from the schema.yml file via the function source e.g. source('imdb', 'movies') refers to the movies table in the imdb database.

    {{ config(materialized='view') }}

    with actor_summary as (
    SELECT id,
    any(actor_name) as name,
    uniqExact(movie_id) as num_movies,
    avg(rank) as avg_rank,
    uniqExact(genre) as genres,
    uniqExact(director_name) as directors,
    max(created_at) as updated_at
    FROM (
    SELECT {{ source('imdb', 'actors') }}.id as id,
    concat({{ source('imdb', 'actors') }}.first_name, ' ', {{ source('imdb', 'actors') }}.last_name) as actor_name,
    {{ source('imdb', 'movies') }}.id as movie_id,
    {{ source('imdb', 'movies') }}.rank as rank,
    genre,
    concat({{ source('imdb', 'directors') }}.first_name, ' ', {{ source('imdb', 'directors') }}.last_name) as director_name,
    created_at
    FROM {{ source('imdb', 'actors') }}
    JOIN {{ source('imdb', 'roles') }} ON {{ source('imdb', 'roles') }}.actor_id = {{ source('imdb', 'actors') }}.id
    LEFT OUTER JOIN {{ source('imdb', 'movies') }} ON {{ source('imdb', 'movies') }}.id = {{ source('imdb', 'roles') }}.movie_id
    LEFT OUTER JOIN {{ source('imdb', 'genres') }} ON {{ source('imdb', 'genres') }}.movie_id = {{ source('imdb', 'movies') }}.id
    LEFT OUTER JOIN {{ source('imdb', 'movie_directors') }} ON {{ source('imdb', 'movie_directors') }}.movie_id = {{ source('imdb', 'movies') }}.id
    LEFT OUTER JOIN {{ source('imdb', 'directors') }} ON {{ source('imdb', 'directors') }}.id = {{ source('imdb', 'movie_directors') }}.director_id
    )
    GROUP BY id
    )

    select *
    from actor_summary

    Note how we include the column updated_at in our final actor_summary. We use this later for incremental materializations.

  4. From the imdb directory execute the command dbt run.

    [email protected]:~/imdb$ dbt run
    15:05:35 Running with dbt=1.0.4
    15:05:35 Found 1 model, 0 tests, 1 snapshot, 0 analyses, 181 macros, 0 operations, 0 seed files, 6 sources, 0 exposures, 0 metrics
    15:05:35
    15:05:36 Concurrency: 1 threads (target='dev')
    15:05:36
    15:05:36 1 of 1 START view model imdb_dbt.actor_summary.................................. [RUN]
    15:05:37 1 of 1 OK created view model imdb_dbt.actor_summary............................. [OK in 1.00s]
    15:05:37
    15:05:37 Finished running 1 view model in 1.97s.
    15:05:37
    15:05:37 Completed successfully
    15:05:37
    15:05:37 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
  5. dbt will represent the model as a view in ClickHouse as requested. We can now query this view directly. This view will have been created in the imdb_dbt database - this is determined by the schema parameter in the file ~/.dbt/profiles.yml under the clickhouse_imdb profile.

    SHOW DATABASES;
    +------------------+
    |name |
    +------------------+
    |INFORMATION_SCHEMA|
    |default |
    |imdb |
    |imdb_dbt | <---created by dbt!
    |information_schema|
    |system |
    +------------------+

    Querying this view, we can replicate the results of our earlier query with a simpler syntax:

    SELECT * FROM imdb_dbt.actor_summary ORDER BY num_movies DESC LIMIT 5;
    +------+------------+----------+------------------+------+---------+-------------------+
    |id |name |num_movies|avg_rank |genres|directors|updated_at |
    +------+------------+----------+------------------+------+---------+-------------------+
    |45332 |Mel Blanc |832 |6.175853582979779 |18 |84 |2022-04-26 15:26:55|
    |621468|Bess Flowers|659 |5.57727638854796 |19 |293 |2022-04-26 15:26:57|
    |372839|Lee Phelps |527 |5.032976449684617 |18 |261 |2022-04-26 15:26:56|
    |283127|Tom London |525 |2.8721716524875673|17 |203 |2022-04-26 15:26:56|
    |356804|Bud Osborne |515 |2.0389507108727773|15 |149 |2022-04-26 15:26:56|
    +------+------------+----------+------------------+------+---------+-------------------+