Querying DateTimes in ClickHouse

In this post we're going to look at some of the most useful ClickHouse functions for querying and filtering by dates and datetimes - things like rounding to the nearest hour or 15-minute window, filtering by time of day, and computing durations between two timestamps.

If you're working with raw date strings or timestamps that need converting first, check out another post that I wrote about parsing dates and datetimes in ClickHouse. This post picks up from there and focuses on what to do once your data is already in a DateTime column.

Importing the New York City taxi dataset

Our dataset of choice is the New York City taxi dataset, so let's get that set up in ClickHouse. First we'll create a database:

1CREATE DATABASE nyc_taxi;

And a table:

1CREATE TABLE nyc_taxi.trips_small (
2    trip_id             UInt32,
3    pickup_datetime     DateTime,
4    dropoff_datetime    DateTime,
5    pickup_longitude    Nullable(Float64),
6    pickup_latitude     Nullable(Float64),
7    dropoff_longitude   Nullable(Float64),
8    dropoff_latitude    Nullable(Float64),
9    passenger_count     UInt8,
10    trip_distance       Float32,
11    fare_amount         Float32,
12    extra               Float32,
13    tip_amount          Float32,
14    tolls_amount        Float32,
15    total_amount        Float32,
16    payment_type        Enum('CSH' = 1, 'CRE' = 2, 'NOC' = 3, 'DIS' = 4, 'UNK' = 5),
17    pickup_ntaname      LowCardinality(String),
18    dropoff_ntaname     LowCardinality(String)
19)
20ENGINE = MergeTree
21PRIMARY KEY (pickup_datetime, dropoff_datetime);

We can then run the following command to import the data:

1INSERT INTO nyc_taxi.trips_small
2SELECT
3    trip_id,
4    pickup_datetime,
5    dropoff_datetime,
6    pickup_longitude,
7    pickup_latitude,
8    dropoff_longitude,
9    dropoff_latitude,
10    passenger_count,
11    trip_distance,
12    fare_amount,
13    extra,
14    tip_amount,
15    tolls_amount,
16    total_amount,
17    payment_type,
18    pickup_ntaname,
19    dropoff_ntaname
20FROM s3(
21    'https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_{0..2}.gz',
22    'TabSeparatedWithNames'
23);

This query imports just over 3 million records, as we can see in the output below:

13000317 rows in set. Elapsed: 32.077 sec. Processed 3.00 million rows, 256.38 MB (93.53 thousand rows/s., 7.99 MB/s.)
2Peak memory usage: 536.51 MiB.

We can adjust the {0..2} in the URI to include more files if we want to import more data.

Now that we're loaded the data, it's time to write some queries. We'll be working with the pickup_datetime and dropoff_datetime columns:

Trips by hour

Let's start by exploring taxi journeys taken on July 1st, 2015. We'll use toStartOfHour to round datetimes down to the nearest hour, toDate to filter to a single day, and toHour to restrict to morning hours:

1SELECT
2  toStartOfHour(pickup_datetime) as hour,
3  count() as trips,
4  round(avg(passenger_count), 1) as avg_passengers
5FROM nyc_taxi.trips_small
6WHERE toDate(pickup_datetime) = '2015-07-01'
7AND toHour(pickup_datetime) < 13
8GROUP BY hour
9ORDER BY hour;

1┌────────────────hour─┬─trips─┬─avg_passengers─┐
2│ 2015-07-01 00:00:00 │   663 │            1.7 │
3│ 2015-07-01 01:00:00 │   381 │            1.6 │
4│ 2015-07-01 02:00:00 │   249 │            1.8 │
5│ 2015-07-01 03:00:00 │   155 │            1.6 │
6│ 2015-07-01 04:00:00 │   159 │            1.5 │
7│ 2015-07-01 05:00:00 │   197 │            1.5 │
8│ 2015-07-01 06:00:00 │   530 │            1.6 │
9│ 2015-07-01 07:00:00 │   849 │            1.6 │
10│ 2015-07-01 08:00:00 │  1034 │            1.6 │
11│ 2015-07-01 09:00:00 │  1033 │            1.7 │
12│ 2015-07-01 10:00:00 │   898 │            1.7 │
13│ 2015-07-01 11:00:00 │   900 │            1.6 │
14│ 2015-07-01 12:00:00 │   961 │            1.7 │
15└─────────────────────┴───────┴────────────────┘

It was quiet overnight, trips start to pick up around 6am, and peak around 8–9am. But was July 1st typical? Let's remove the date filter and look across all days. To do that we cast toStartOfHour down to a Time type using ::Time - that strips out the date and leaves just the time, so all days are grouped together:

1SELECT
2  toStartOfHour(pickup_datetime)::Time as hour,
3  count() as trips,
4  round(avg(passenger_count), 1) as avg_passengers
5FROM nyc_taxi.trips_small
6WHERE toHour(pickup_datetime) < 13
7GROUP BY hour
8ORDER BY hour;

1┌─────hour─┬──trips─┬─avg_passengers─┐
2│ 00:00:00 │ 118268 │            1.7 │
3│ 01:00:00 │  86495 │            1.7 │
4│ 02:00:00 │  65246 │            1.7 │
5│ 03:00:00 │  47377 │            1.7 │
6│ 04:00:00 │  34840 │            1.7 │
7│ 05:00:00 │  32328 │            1.6 │
8│ 06:00:00 │  68644 │            1.6 │
9│ 07:00:00 │ 107494 │            1.6 │
10│ 08:00:00 │ 132596 │            1.6 │
11│ 09:00:00 │ 136228 │            1.6 │
12│ 10:00:00 │ 134286 │            1.7 │
13│ 11:00:00 │ 137561 │            1.7 │
14│ 12:00:00 │ 145282 │            1.7 │
15└──────────┴────────┴────────────────┘

When we look across all days we still see a reduction in journeys over night, but the increase starts a couple of hours earlier, from 6am to 7am. The number of trips then stays reasonably stable for the next four hours.

Rush hour in 15-minute windows

When exactly does the morning rush start? Let's zoom in using toStartOfFifteenMinutes to bucket by 15-minute intervals, and formatDateTime to make the output readable. In the WHERE clause we cast pickup_datetime to Time to filter just by time of day - no date needed:

1SELECT
2  formatDateTime(toStartOfFifteenMinutes(pickup_datetime), '%r') AS timeWindow,
3  count() as trips,
4  round(avg(trip_distance), 2) as avgDistance
5FROM nyc_taxi.trips_small
6WHERE pickup_datetime::Time BETWEEN '06:00:00'::Time AND '09:59:59'::Time
7AND trip_distance > 0
8GROUP BY timeWindow
9ORDER BY timeWindow;

1┌─timeWindow─┬─trips─┬─avgDistance─┐
2│ 06:00 AM   │ 11601 │        4.47 │
3│ 06:15 AM   │ 14645 │        3.97 │
4│ 06:30 AM   │ 19033 │        3.67 │
5│ 06:45 AM   │ 22795 │         3.2 │
6│ 07:00 AM   │ 23179 │        3.27 │
7│ 07:15 AM   │ 25465 │        3.12 │
8│ 07:30 AM   │ 28350 │        3.04 │
9│ 07:45 AM   │ 29914 │        2.89 │
10│ 08:00 AM   │ 30444 │           3 │
11│ 08:15 AM   │ 32063 │        2.91 │
12│ 08:30 AM   │ 34293 │         2.8 │
13│ 08:45 AM   │ 35116 │        2.63 │
14│ 09:00 AM   │ 33776 │        2.74 │
15│ 09:15 AM   │ 33800 │        2.72 │
16│ 09:30 AM   │ 33694 │        2.73 │
17│ 09:45 AM   │ 34235 │        2.63 │
18└────────────┴───────┴─────────────┘

The surge begins at 6:30, accelerates until 7:30, and peaks at 8:45.

What does rush hour feel like?

We know when rush hour starts - but what does it feel like if you're in a taxi? We can use dateDiff to compute journey duration in minutes, which lets us calculate average speed. We also use dateDiff in the WHERE clause to filter out zero-duration trips (bad data):

1WITH buckets AS (
2  SELECT
3    formatDateTime(toStartOfFifteenMinutes(pickup_datetime), '%r') AS timeWindow,
4    count() as trips,
5    round(avg(trip_distance), 2) as avgDist,
6    round(avg(dateDiff('minute', pickup_datetime, dropoff_datetime)), 1) AS avgDuration,
7    round(avg(
8      trip_distance /
9      (dateDiff('minute', pickup_datetime, dropoff_datetime) / 60)
10    ), 1) AS avgSpeed
11  FROM nyc_taxi.trips_small
12  WHERE pickup_datetime::Time BETWEEN '06:00:00'::Time AND '09:59:59'::Time
13  AND trip_distance > 0
14  AND dateDiff('minute', pickup_datetime, dropoff_datetime) > 0
15  GROUP BY timeWindow
16  ORDER BY timeWindow
17)
18SELECT timeWindow, trips, avgDuration, avgDist, avgSpeed,
19       bar(avgSpeed, 0, (SELECT max(avgSpeed) FROM buckets), 20) AS speedBar
20FROM buckets
21ORDER BY timeWindow ASC;

1┌─timeWindow─┬─trips─┬─avgDuration─┬─avgDist─┬─avgSpeed─┬─speedBar─────────────┐
2│ 06:00 AM   │ 11562 │        13.2 │    4.48 │     19.3 │ ████████████████████ │
3│ 06:15 AM   │ 14609 │        13.1 │    3.98 │     18.2 │ ██████████████████▊  │
4│ 06:30 AM   │ 18993 │        12.5 │    3.67 │     17.3 │ █████████████████▉   │
5│ 06:45 AM   │ 22754 │        11.5 │    3.21 │     16.1 │ ████████████████▋    │
6│ 07:00 AM   │ 23139 │        12.3 │    3.27 │     15.4 │ ███████████████▉     │
7│ 07:15 AM   │ 25428 │        12.7 │    3.12 │     14.3 │ ██████████████▊      │
8│ 07:30 AM   │ 28313 │        13.9 │    3.04 │     13.5 │ █████████████▉       │
9│ 07:45 AM   │ 29873 │        13.8 │    2.89 │     12.7 │ █████████████▏       │
10│ 08:00 AM   │ 30411 │        14.2 │       3 │       12 │ ████████████▍        │
11│ 08:15 AM   │ 32017 │        15.2 │    2.91 │     11.5 │ ███████████▉         │
12│ 08:30 AM   │ 34258 │        15.4 │     2.8 │       11 │ ███████████▍         │
13│ 08:45 AM   │ 35071 │        14.9 │    2.64 │     10.8 │ ███████████▏         │
14│ 09:00 AM   │ 33718 │        15.3 │    2.74 │     10.9 │ ███████████▎         │
15│ 09:15 AM   │ 33754 │        15.3 │    2.72 │     10.8 │ ███████████▏         │
16│ 09:30 AM   │ 33657 │        15.4 │    2.73 │     10.8 │ ███████████▏         │
17│ 09:45 AM   │ 34188 │        14.8 │    2.63 │     10.9 │ ███████████▎         │
18└────────────┴───────┴─────────────┴─────────┴──────────┴──────────────────────┘

The bar function draws an ASCII bar chart scaled to the max value - a handy way to visualize relative values inline.

At 6am taxis are moving at just over 19 mph. By 8am that's down to around 12 mph - slow, but pretty standard for a big city during rush hour. It keeps getting slower as the morning wears on.

Weekdays vs weekends

We've confirmed rush hour exists - but does it happen on weekends too? We can split weekday and weekend trips using countIf with toDayOfWeek: values 1–5 are weekdays, 6–7 are weekends. We then use the lag window function to compute the percentage change in trips between each 15-minute window:

1WITH trips AS (
2  SELECT
3    formatDateTime(toStartOfFifteenMinutes(pickup_datetime), '%r') AS timeWindow,
4    countIf(toDayOfWeek(pickup_datetime) <= 5) as wdTrips,
5    countIf(toDayOfWeek(pickup_datetime) > 5) as weTrips
6  FROM nyc_taxi.trips_small
7  WHERE trip_distance > 0
8  AND pickup_datetime::Time BETWEEN '06:00:00'::Time AND '09:59:59'::Time
9  GROUP BY timeWindow
10  ORDER BY timeWindow
11)
12SELECT timeWindow, wdTrips,
13  round((
14    (wdTrips - lag(wdTrips) OVER (ORDER BY timeWindow)) /
15    lag(wdTrips) OVER (ORDER BY timeWindow)) * 100,
16  1) as wdPctChange,
17  weTrips,
18  round(
19    ((weTrips - lag(weTrips) OVER (ORDER BY timeWindow)) /
20    lag(weTrips) OVER (ORDER BY timeWindow)) * 100,
21  1) as wePctChange
22FROM trips
23ORDER BY timeWindow;

1┌─timeWindow─┬─wdTrips─┬─wdPctChange─┬─weTrips─┬─wePctChange─┐
2│ 06:00 AM   │    9398 │         inf │    2203 │         inf │
3│ 06:15 AM   │   12254 │        30.4 │    2391 │         8.5 │
4│ 06:30 AM   │   16106 │        31.4 │    2927 │        22.4 │
5│ 06:45 AM   │   19727 │        22.5 │    3068 │         4.8 │
6│ 07:00 AM   │   20285 │         2.8 │    2894 │        -5.7 │
7│ 07:15 AM   │   22129 │         9.1 │    3336 │        15.3 │
8│ 07:30 AM   │   24494 │        10.7 │    3856 │        15.6 │
9│ 07:45 AM   │   25681 │         4.8 │    4233 │         9.8 │
10│ 08:00 AM   │   26259 │         2.3 │    4185 │        -1.1 │
11│ 08:15 AM   │   27509 │         4.8 │    4554 │         8.8 │
12│ 08:30 AM   │   28891 │           5 │    5402 │        18.6 │
13│ 08:45 AM   │   29154 │         0.9 │    5962 │        10.4 │
14│ 09:00 AM   │   27872 │        -4.4 │    5904 │          -1 │
15│ 09:15 AM   │   27268 │        -2.2 │    6532 │        10.6 │
16│ 09:30 AM   │   26426 │        -3.1 │    7268 │        11.3 │
17│ 09:45 AM   │   26070 │        -1.3 │    8165 │        12.3 │
18└────────────┴─────────┴─────────────┴─────────┴─────────────┘

Weekdays show a sharp spike at 6:15–6:30, a second (smaller) increase around 7:15–7:30, and peak at 8:45 before leveling off. Weekends are different: the increase at 6:30 is gentler, and trips keep growing steadily well into the morning.

Get started today

Interested in seeing how ClickHouse works on your data? Get started with ClickHouse Cloud in minutes and receive $300 in free credits.

Querying DateTimes in ClickHouse

Importing the New York City taxi dataset

Trips by hour

Rush hour in 15-minute windows

What does rush hour feel like?

Weekdays vs weekends

Get started today

Subscribe to our newsletter

Recent posts

ClickHouse achieves AWS Small and Medium Business Competency

Agentic infrastructure with clickhousectl: a distributed ClickStack across three continents

Benchmarking the ClickStack MCP Server with hdx-evals

Migrate Datadog telemetry with the OpenTelemetry Collector

Querying DateTimes in ClickHouse

Importing the New York City taxi dataset #

Trips by hour #

Rush hour in 15-minute windows #

What does rush hour feel like? #

Weekdays vs weekends #

Get started today

Subscribe to our newsletter

Recent posts

ClickHouse achieves AWS Small and Medium Business Competency

Agentic infrastructure with clickhousectl: a distributed ClickStack across three continents

Benchmarking the ClickStack MCP Server with hdx-evals

Migrate Datadog telemetry with the OpenTelemetry Collector

Importing the New York City taxi dataset

Trips by hour

Rush hour in 15-minute windows

What does rush hour feel like?

Weekdays vs weekends