ClickHouse Query Challenge 🚀 🏆
Welcome to the ClickHouse Query Challenge! If you're reading this, you're about to put your SQL skills to the test against a real-time GitHub dataset.
Solve as many questions as possible using just ClickHouse SQL to win the prize (and, of course, bragging rights)!
How It Works
-
🔥 In another tab, you have the SQL Console open, where the challenge is running.
-
⏳ Your task? Solve as many queries as possible in 5 minutes.
-
📝 Each challenge contains a question with hints. Write the SQL query, run it, and we'll tell you if the answer is correct.
-
📊 We'll record your score at the end for the leaderboard.
-
🎲 Tiebreaker: If multiple participants have the same score, we'll use ClickHouse to randomly select a winner.
Get Started
In the next section, you'll find example queries to help you prepare. These will show you how to filter event types, track trends, and analyze GitHub activity using ClickHouse's blazing-fast SQL engine.
Schema
Let's first familiarize you with the schema. The full schema is quite large, so we've limited the columns below to those you'll need to answer all of the questions.
CREATE TABLE github.events
(
`event_type` Enum8('CommitCommentEvent' = 1, 'CreateEvent' = 2, 'DeleteEvent' = 3, 'ForkEvent' = 4, 'GollumEvent' = 5, 'IssueCommentEvent' = 6, 'IssuesEvent' = 7, 'MemberEvent' = 8, 'PublicEvent' = 9, 'PullRequestEvent' = 10, 'PullRequestReviewCommentEvent' = 11, 'PushEvent' = 12, 'ReleaseEvent' = 13, 'SponsorshipEvent' = 14, 'WatchEvent' = 15, 'GistEvent' = 16, 'FollowEvent' = 17, 'DownloadEvent' = 18, 'PullRequestReviewEvent' = 19, 'ForkApplyEvent' = 20, 'Event' = 21, 'TeamAddEvent' = 22), -- the type of the event
`actor_login` LowCardinality(String), -- user to whom the event was associated
`repo_name` LowCardinality(String), -- repository name, can change
`repo_id` LowCardinality(String), -- the repository id, never change
`created_at` DateTime, -- the time the event occurred at
`requested_reviewers` Array(LowCardinality(String)), -- users requested to review
`assignee` LowCardinality(String), -- user assigned to an issue
`body` String, -- body of an issue or PR
`title` String, -- the title of an issue or PR
`labels` Array(LowCardinality(String)), --labels associated with a PR or issue
..
)
ENGINE = MergeTree
ORDER BY (event_type, repo_name, toDate(created_at))
A few important notes:
Repository Names vs. IDs
Repositories on GitHub can change names—for example, a project might move from a personal account to an organization. However, the repository ID never changes. If you want to track a repository over time reliably, use repo_id
instead of repo_name
. We'll let you know if this is needed.
Event Types
The event_type
column records different types of activities on GitHub. Some common event types include:
- WatchEvent - A user stars a repository (GitHub refers to this action as "watching," but it's equivalent to giving a star).
- ForkEvent - A user creates a fork of a repository.
- PullRequestEvent - A pull request is opened, closed, or merged.
- IssuesEvent - An issue is created, closed, reopened, or commented on.
- PushEvent - Code is pushed to a repository.
Example queries
To get you started, here are a few more example queries:
Total stars for a repository
1SELECT count() as stars
2FROM github.events
3WHERE (repo_name = 'ClickHouse/ClickHouse') AND (event_type = 'WatchEvent')
or more accurately...
1SELECT count() as stars
2FROM github.events
3WHERE (repo_id = dictGet('github.repo_name_to_id_dict', 'repo_id', cityHash64('ClickHouse/ClickHouse'))::String) AND (event_type = 'WatchEvent')
Total contributors
An estimate of the number of people who have made a PR to ClickHouse.
1SELECT uniq(actor_login) as contributors
2FROM github.events
3WHERE (repo_name = 'ClickHouse/ClickHouse') AND (event_type = 'PullRequestEvent')
Releases per year
The number of releases per year to the ClickHouse repository.
1SELECT toStartOfYear(created_at) AS year, count() AS releases
2FROM github.events
3WHERE (repo_name = 'ClickHouse/clickhouse-go') AND (event_type = 'ReleaseEvent')
4GROUP BY year ORDER BY year ASC
Issues and repositories mentioning ClickHouse by month
Let's check the title only.
1SELECT
2 toStartOfMonth(created_at) AS month,
3 count() AS issues,
4 uniq(repo_name) AS repositories
5FROM github.events
6WHERE (title ILIKE '%ClickHouse%') AND (event_type = 'IssuesEvent')
7GROUP BY month
8ORDER BY month ASC
When you're ready, switch to your challenge tab and get ready - your time starts now! ⏳
Current leaders
Check our the current leaders by running the query below!
1SELECT 2 name, 3 score 4FROM results 5ORDER BY score DESC LIMIT 5