Reading data from Google Cloud Storage
Using ClickHouse S3 table function, users can query GCS data as a source without requiring persistence in ClickHouse. The following example illustrates how to read 10 rows of the NYC Taxi dataset.
SELECT
trip_id,
total_amount,
pickup_longitude,
pickup_latitude,
dropoff_longitude,
dropoff_latitude,
pickup_datetime,
dropoff_datetime,
trip_distance
FROM s3(
'https://storage.googleapis.com/clickhouse-public-datasets/nyc-taxi/trips_{0..2}.gz',
'TabSeparatedWithNames'
) LIMIT 10;
Inserting Data from Google Cloud Storage
To transfer data from GCS to ClickHouse, users can combine the s3 table function with INSERT statement. Let's create an empty trips
table:
CREATE TABLE trips ORDER BY tuple
(
) EMPTY AS SELECT * FROM s3(
'https://storage.googleapis.com/clickhouse-public-datasets/nyc-taxi/trips_{0..2}.gz',
'TabSeparatedWithNames'
);
This creates an empty table using the schema inferred from the data. We can then insert the first 1 million rows from the remote dataset
INSERT INTO trips SELECT *
FROM
s3(
'https://storage.googleapis.com/clickhouse-public-datasets/nyc-taxi/trips_{0..2}.gz',
'TabSeparatedWithNames'
)
LIMIT 1000000;