Going to re:Invent this December? Come party with us and The Chainsmokers ->->

Reading data from S3

Using ClickHouse S3 table function, users can query S3 data as a source without requiring persistence in ClickHouse. The following example illustrates how to read 10 rows of the NYC Taxi dataset.

SELECT 
    trip_id, 
    total_amount, 
    pickup_longitude, 
    pickup_latitude, 
    dropoff_longitude, 
    dropoff_latitude, 
    pickup_datetime, 
    dropoff_datetime,
    trip_distance
FROM 
s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/nyc-taxi/trips_*.gz', 'TabSeparatedWithNames') 
LIMIT 10
SETTINGS input_format_try_infer_datetimes = 0;

Inserting Data from S3

To transfer data from S3 to ClickHouse, users can combine the s3 table function with INSERT statement. Let's create an empty hackernews table:

CREATE TABLE hackernews ORDER BY tuple
(
) EMPTY AS SELECT * FROM s3('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz', 'CSVWithNames');

This creates an empty table using the schema inferred from the data. We can then insert the first 1 million rows from the remote dataset

INSERT INTO hackernews SELECT *
FROM url('https://datasets-documentation.s3.eu-west-3.amazonaws.com/hackernews/hacknernews.csv.gz', 'CSVWithNames')
LIMIT 1000000;

Other integrations

Get started with ClickHouse Cloud for free

We'll get you started on a 30 day trial and $300 credits to spend at your own pace.