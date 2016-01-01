DataStore Factory Methods
DataStore provides over 20 factory methods to create instances from various data sources including local files, databases, cloud storage, and data lakes.
Universal URI Interface
The
uri() method is the recommended universal entry point that auto-detects the source type:
URI Syntax Reference
|Source Type
|URI Format
|Example
|Local file
path/to/file
data.csv,
/abs/path/data.parquet
|S3
s3://bucket/path
s3://mybucket/data.parquet?nosign=true
|GCS
gs://bucket/path
gs://mybucket/data.csv
|Azure
az://container/path
az://mycontainer/data.parquet
|HTTP/HTTPS
https://url
https://example.com/data.csv
|MySQL
mysql://user:pass@host:port/db/table
mysql://root:pass@localhost:3306/mydb/users
|PostgreSQL
postgresql://user:pass@host:port/db/table
postgresql://postgres:pass@localhost:5432/mydb/users
|SQLite
sqlite:///path?table=name
sqlite:///data.db?table=users
|ClickHouse
clickhouse://host:port/db/table
clickhouse://localhost:9000/default/hits
File Sources
from_file
Create DataStore from a local or remote file with automatic format detection.
Parameters:
|Parameter
|Type
|Default
|Description
path
|str
|required
|File path (local or URL)
format
|str
None
|File format (auto-detected if None)
compression
|str
None
|Compression type (auto-detected if None)
Supported formats: CSV, TSV, Parquet, JSON, JSONLines, ORC, Avro, Arrow
Examples:
Pandas-Compatible Read Functions
Cloud Storage
from_s3
Create DataStore from Amazon S3.
Parameters:
|Parameter
|Type
|Default
|Description
url
|str
|required
|S3 URL (s3://bucket/path)
access_key_id
|str
None
|AWS access key ID
secret_access_key
|str
None
|AWS secret access key
format
|str
None
|File format (auto-detected)
Examples:
from_gcs
Create DataStore from Google Cloud Storage.
Examples:
from_azure
Create DataStore from Azure Blob Storage.
Examples:
from_hdfs
Create DataStore from HDFS.
Examples:
from_url
Create DataStore from HTTP/HTTPS URL.
Examples:
Databases
from_mysql
Create DataStore from MySQL database.
Parameters:
|Parameter
|Type
|Default
|Description
host
|str
|required
|MySQL host
database
|str
|required
|Database name
table
|str
|required
|Table name
user
|str
|required
|Username
password
|str
|required
|Password
port
|int
3306
|Port number
Examples:
from_postgresql
Create DataStore from PostgreSQL database.
Examples:
from_clickhouse
Create DataStore from ClickHouse server.
Examples:
from_mongodb
Create DataStore from MongoDB.
Examples:
from_sqlite
Create DataStore from SQLite database.
Examples:
Data Lakes
from_iceberg
Create DataStore from Apache Iceberg table.
Examples:
from_delta
Create DataStore from Delta Lake table.
Examples:
from_hudi
Create DataStore from Apache Hudi table.
Examples:
In-Memory Sources
from_df /
from_dataframe
Create DataStore from pandas DataFrame.
Examples:
DataFrame Constructor
Create DataStore using pandas-like constructor.
Special Sources
from_numbers
Create DataStore with sequential numbers (useful for testing).
Examples:
from_random
Create DataStore with random data.
Examples:
run_sql
Create DataStore from raw SQL query.
Examples:
Summary Table
|Method
|Source Type
|Example
uri()
|Universal
DataStore.uri("s3://bucket/data.parquet")
from_file()
|Local/Remote files
DataStore.from_file("data.csv")
read_csv()
|CSV files
pd.read_csv("data.csv")
read_parquet()
|Parquet files
pd.read_parquet("data.parquet")
from_s3()
|Amazon S3
DataStore.from_s3("s3://bucket/path")
from_gcs()
|Google Cloud Storage
DataStore.from_gcs("gs://bucket/path")
from_azure()
|Azure Blob
DataStore.from_azure("az://container/path")
from_hdfs()
|HDFS
DataStore.from_hdfs("hdfs://host/path")
from_url()
|HTTP/HTTPS
DataStore.from_url("https://example.com/data.csv")
from_mysql()
|MySQL
DataStore.from_mysql(host, db, table, user, pass)
from_postgresql()
|PostgreSQL
DataStore.from_postgresql(host, db, table, user, pass)
from_clickhouse()
|ClickHouse
DataStore.from_clickhouse(host, db, table)
from_mongodb()
|MongoDB
DataStore.from_mongodb(uri, db, collection)
from_sqlite()
|SQLite
DataStore.from_sqlite("data.db", table)
from_iceberg()
|Apache Iceberg
DataStore.from_iceberg("/path/to/table")
from_delta()
|Delta Lake
DataStore.from_delta("/path/to/table")
from_hudi()
|Apache Hudi
DataStore.from_hudi("/path/to/table")
from_df()
|pandas DataFrame
DataStore.from_df(pandas_df)
DataFrame()
|Dictionary/DataFrame
pd.DataFrame({'a': [1, 2, 3]})
from_numbers()
|Sequential numbers
DataStore.from_numbers(1000000)
from_random()
|Random data
DataStore.from_random(rows=1000, columns=5)
run_sql()
|Raw SQL
DataStore.run_sql("SELECT * FROM ...")