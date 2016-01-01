DataStore Factory Methods

DataStore provides over 20 factory methods to create instances from various data sources including local files, databases, cloud storage, and data lakes.

The uri() method is the recommended universal entry point that auto-detects the source type:

from chdb.datastore import DataStore # Local files ds = DataStore.uri("data.csv") ds = DataStore.uri("/path/to/data.parquet") # Cloud storage ds = DataStore.uri("s3://bucket/data.parquet?nosign=true") ds = DataStore.uri("https://example.com/data.csv") # Databases ds = DataStore.uri("mysql://user:pass@host:3306/db/table") ds = DataStore.uri("postgresql://user:pass@host:5432/db/table")

Source Type URI Format Example Local file path/to/file data.csv , /abs/path/data.parquet S3 s3://bucket/path s3://mybucket/data.parquet?nosign=true GCS gs://bucket/path gs://mybucket/data.csv Azure az://container/path az://mycontainer/data.parquet HTTP/HTTPS https://url https://example.com/data.csv MySQL mysql://user:pass@host:port/db/table mysql://root:pass@localhost:3306/mydb/users PostgreSQL postgresql://user:pass@host:port/db/table postgresql://postgres:pass@localhost:5432/mydb/users SQLite sqlite:///path?table=name sqlite:///data.db?table=users ClickHouse clickhouse://host:port/db/table clickhouse://localhost:9000/default/hits

Create DataStore from a local or remote file with automatic format detection.

DataStore.from_file(path, format=None, compression=None, **kwargs)

Parameters:

Parameter Type Default Description path str required File path (local or URL) format str None File format (auto-detected if None) compression str None Compression type (auto-detected if None)

Supported formats: CSV, TSV, Parquet, JSON, JSONLines, ORC, Avro, Arrow

Examples:

from chdb.datastore import DataStore # Auto-detect format from extension ds = DataStore.from_file("data.csv") ds = DataStore.from_file("data.parquet") ds = DataStore.from_file("data.json") # Explicit format ds = DataStore.from_file("data.txt", format="CSV") # With compression ds = DataStore.from_file("data.csv.gz", compression="gzip")

from chdb import datastore as pd # CSV files ds = pd.read_csv("data.csv") ds = pd.read_csv("data.csv", sep=";", header=0, nrows=1000) # Parquet files (recommended for large datasets) ds = pd.read_parquet("data.parquet") ds = pd.read_parquet("data.parquet", columns=['col1', 'col2']) # JSON files ds = pd.read_json("data.json") ds = pd.read_json("data.jsonl", lines=True) # Excel files ds = pd.read_excel("data.xlsx", sheet_name="Sheet1")

Create DataStore from Amazon S3.

DataStore.from_s3(url, access_key_id=None, secret_access_key=None, format=None, **kwargs)

Parameters:

Parameter Type Default Description url str required S3 URL (s3://bucket/path) access_key_id str None AWS access key ID secret_access_key str None AWS secret access key format str None File format (auto-detected)

Examples:

from chdb.datastore import DataStore # Anonymous access (public bucket) ds = DataStore.from_s3("s3://bucket/data.parquet") # With credentials ds = DataStore.from_s3( "s3://bucket/data.parquet", access_key_id="AKIAIOSFODNN7EXAMPLE", secret_access_key="wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY" ) # Using URI with query parameters ds = DataStore.uri("s3://bucket/data.parquet?nosign=true") ds = DataStore.uri("s3://bucket/data.parquet?access_key_id=KEY&secret_access_key=SECRET")

Create DataStore from Google Cloud Storage.

DataStore.from_gcs(url, credentials_path=None, **kwargs)

Examples:

ds = DataStore.from_gcs("gs://bucket/data.parquet") ds = DataStore.from_gcs("gs://bucket/data.parquet", credentials_path="/path/to/creds.json")

Create DataStore from Azure Blob Storage.

DataStore.from_azure(url, account_name=None, account_key=None, **kwargs)

Examples:

ds = DataStore.from_azure( "az://container/data.parquet", account_name="myaccount", account_key="mykey" )

Create DataStore from HDFS.

DataStore.from_hdfs(url, **kwargs)

Examples:

ds = DataStore.from_hdfs("hdfs://namenode:8020/path/data.parquet")

Create DataStore from HTTP/HTTPS URL.

DataStore.from_url(url, format=None, **kwargs)

Examples:

ds = DataStore.from_url("https://example.com/data.csv") ds = DataStore.from_url("https://raw.githubusercontent.com/user/repo/main/data.parquet")

Create DataStore from MySQL database.

DataStore.from_mysql(host, database, table, user, password, port=3306, **kwargs)

Parameters:

Parameter Type Default Description host str required MySQL host database str required Database name table str required Table name user str required Username password str required Password port int 3306 Port number

Examples:

ds = DataStore.from_mysql( host="localhost", database="mydb", table="users", user="root", password="password" ) # Using URI ds = DataStore.uri("mysql://root:password@localhost:3306/mydb/users")

Create DataStore from PostgreSQL database.

DataStore.from_postgresql(host, database, table, user, password, port=5432, **kwargs)

Examples:

ds = DataStore.from_postgresql( host="localhost", database="mydb", table="users", user="postgres", password="password" ) # Using URI ds = DataStore.uri("postgresql://postgres:password@localhost:5432/mydb/users")

Create DataStore from ClickHouse server.

DataStore.from_clickhouse(host, database, table, user=None, password=None, port=9000, **kwargs)

Examples:

ds = DataStore.from_clickhouse( host="localhost", database="default", table="hits", user="default", password="" ) # Connection-level mode (explore databases) ds = DataStore.from_clickhouse( host="analytics.company.com", user="analyst", password="secret" ) ds.databases() # List databases ds.tables("production") # List tables result = ds.sql("SELECT * FROM production.users LIMIT 10")

Create DataStore from MongoDB.

DataStore.from_mongodb(uri, database, collection, **kwargs)

Examples:

ds = DataStore.from_mongodb( uri="mongodb://localhost:27017", database="mydb", collection="users" )

Create DataStore from SQLite database.

DataStore.from_sqlite(database_path, table, **kwargs)

Examples:

ds = DataStore.from_sqlite("data.db", table="users") # Using URI ds = DataStore.uri("sqlite:///data.db?table=users")

Create DataStore from Apache Iceberg table.

DataStore.from_iceberg(path, **kwargs)

Examples:

ds = DataStore.from_iceberg("/path/to/iceberg_table") ds = DataStore.uri("iceberg://catalog/namespace/table")

Create DataStore from Delta Lake table.

DataStore.from_delta(path, **kwargs)

Examples:

ds = DataStore.from_delta("/path/to/delta_table") ds = DataStore.uri("deltalake:///path/to/delta_table")

Create DataStore from Apache Hudi table.

DataStore.from_hudi(path, **kwargs)

Examples:

ds = DataStore.from_hudi("/path/to/hudi_table") ds = DataStore.uri("hudi:///path/to/hudi_table")

Create DataStore from pandas DataFrame.

DataStore.from_df(df, name=None) DataStore.from_dataframe(df, name=None) # alias

Examples:

import pandas from chdb.datastore import DataStore pdf = pandas.DataFrame({'a': [1, 2, 3], 'b': ['x', 'y', 'z']}) ds = DataStore.from_df(pdf)

Create DataStore using pandas-like constructor.

from chdb import datastore as pd # From dictionary ds = pd.DataFrame({ 'name': ['Alice', 'Bob'], 'age': [25, 30] }) # From pandas DataFrame import pandas pdf = pandas.DataFrame({'a': [1, 2, 3]}) ds = pd.DataFrame(pdf)

Create DataStore with sequential numbers (useful for testing).

DataStore.from_numbers(count, **kwargs)

Examples:

ds = DataStore.from_numbers(1000000) # 1M rows with 'number' column result = ds.filter(ds['number'] % 2 == 0).head(10) # Even numbers

Create DataStore with random data.

DataStore.from_random(rows, columns, **kwargs)

Examples:

ds = DataStore.from_random(rows=1000, columns=5)

Create DataStore from raw SQL query.

DataStore.run_sql(query)

Examples:

ds = DataStore.run_sql(""" SELECT number, number * 2 as doubled FROM numbers(100) WHERE number % 10 = 0 """)