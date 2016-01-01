DataStore Class Reference
This reference documents the core classes in the DataStore API.
DataStore
The main DataFrame-like class for data manipulation.
Constructor
Parameters:
|Parameter
|Type
|Description
data
|dict/list/DataFrame/DataStore
|Input data
columns
|list
|Column names
index
|Index
|Row index
dtype
|dict
|Column data types
copy
|bool
|Copy data
Examples:
Properties
|Property
|Type
|Description
columns
|Index
|Column names
dtypes
|Series
|Column data types
shape
|tuple
|(rows, columns)
size
|int
|Total elements
ndim
|int
|Number of dimensions (2)
empty
|bool
|Is DataFrame empty
values
|ndarray
|Underlying data as NumPy array
index
|Index
|Row index
T
|DataStore
|Transpose
axes
|list
|List of axes
Factory Methods
|Method
|Description
uri(uri)
|Universal factory from URI
from_file(path, ...)
|Create from file
from_df(df)
|Create from pandas DataFrame
from_s3(url, ...)
|Create from S3
from_gcs(url, ...)
|Create from Google Cloud Storage
from_azure(url, ...)
|Create from Azure Blob
from_mysql(...)
|Create from MySQL
from_postgresql(...)
|Create from PostgreSQL
from_clickhouse(...)
|Create from ClickHouse
from_mongodb(...)
|Create from MongoDB
from_sqlite(...)
|Create from SQLite
from_iceberg(path)
|Create from Iceberg table
from_delta(path)
|Create from Delta Lake
from_numbers(n)
|Create with sequential numbers
from_random(rows, cols)
|Create with random data
run_sql(query)
|Create from SQL query
See Factory Methods for details.
Query Methods
|Method
|Returns
|Description
select(*cols)
|DataStore
|Select columns
filter(condition)
|DataStore
|Filter rows
where(condition)
|DataStore
|Alias for filter
sort(*cols, ascending=True)
|DataStore
|Sort rows
orderby(*cols)
|DataStore
|Alias for sort
limit(n)
|DataStore
|Limit rows
offset(n)
|DataStore
|Skip rows
distinct(subset=None)
|DataStore
|Remove duplicates
groupby(*cols)
|LazyGroupBy
|Group rows
having(condition)
|DataStore
|Filter groups
join(right, ...)
|DataStore
|Join DataStores
union(other, all=False)
|DataStore
|Combine DataStores
when(cond, val)
|CaseWhen
|CASE WHEN
See Query Building for details.
Pandas-Compatible Methods
See Pandas Compatibility for the complete list of 209 methods.
Indexing:
head(),
tail(),
sample(),
loc,
iloc,
at,
iat,
query(),
isin(),
where(),
mask(),
get(),
xs(),
pop()
Aggregation:
sum(),
mean(),
std(),
var(),
min(),
max(),
median(),
count(),
nunique(),
quantile(),
describe(),
corr(),
cov(),
skew(),
kurt()
Manipulation:
drop(),
drop_duplicates(),
dropna(),
fillna(),
replace(),
rename(),
assign(),
astype(),
copy()
Sorting:
sort_values(),
sort_index(),
nlargest(),
nsmallest(),
rank()
Reshaping:
pivot(),
pivot_table(),
melt(),
stack(),
unstack(),
transpose(),
explode(),
squeeze()
Combining:
merge(),
join(),
concat(),
append(),
combine(),
update(),
compare()
Apply/Transform:
apply(),
applymap(),
map(),
agg(),
transform(),
pipe(),
groupby()
Time Series:
rolling(),
expanding(),
ewm(),
shift(),
diff(),
pct_change(),
resample()
I/O Methods
|Method
|Description
to_csv(path, ...)
|Export to CSV
to_parquet(path, ...)
|Export to Parquet
to_json(path, ...)
|Export to JSON
to_excel(path, ...)
|Export to Excel
to_df()
|Convert to pandas DataFrame
to_pandas()
|Alias for to_df
to_arrow()
|Convert to Arrow Table
to_dict(orient)
|Convert to dictionary
to_records()
|Convert to records
to_numpy()
|Convert to NumPy array
to_sql()
|Generate SQL string
to_string()
|String representation
to_markdown()
|Markdown table
to_html()
|HTML table
See I/O Operations for details.
Debugging Methods
|Method
|Description
explain(verbose=False)
|Show execution plan
clear_cache()
|Clear cached results
See Debugging for details.
Magic Methods
|Method
|Description
__getitem__(key)
ds['col'],
ds[['a', 'b']],
ds[condition]
__setitem__(key, value)
ds['col'] = value
__delitem__(key)
del ds['col']
__len__()
len(ds)
__iter__()
for col in ds
__contains__(key)
'col' in ds
__repr__()
repr(ds)
__str__()
str(ds)
__eq__(other)
ds == other
__ne__(other)
ds != other
__lt__(other)
ds < other
__le__(other)
ds <= other
__gt__(other)
ds > other
__ge__(other)
ds >= other
__add__(other)
ds + other
__sub__(other)
ds - other
__mul__(other)
ds * other
__truediv__(other)
ds / other
__floordiv__(other)
ds // other
__mod__(other)
ds % other
__pow__(other)
ds ** other
__and__(other)
ds & other
__or__(other)
|`ds
__invert__()
~ds
__neg__()
-ds
__pos__()
+ds
__abs__()
abs(ds)
ColumnExpr
Represents a column expression for lazy evaluation. Returned when accessing a column.
Properties
|Property
|Type
|Description
name
|str
|Column name
dtype
|dtype
|Data type
Accessors
|Accessor
|Description
|Methods
.str
|String operations
|56 methods
.dt
|DateTime operations
|42+ methods
.arr
|Array operations
|37 methods
.json
|JSON parsing
|13 methods
.url
|URL parsing
|15 methods
.ip
|IP address operations
|9 methods
.geo
|Geo/distance operations
|14 methods
See Accessors for complete documentation.
Arithmetic Operations
Comparison Operations
Logical Operations
Methods
|Method
|Description
as_(alias)
|Set alias name
cast(dtype)
|Cast to type
astype(dtype)
|Alias for cast
isnull()
|Is NULL
notnull()
|Is not NULL
isna()
|Alias for isnull
notna()
|Alias for notnull
isin(values)
|In list of values
between(low, high)
|Between two values
fillna(value)
|Fill NULLs
replace(to_replace, value)
|Replace values
clip(lower, upper)
|Clip values
abs()
|Absolute value
round(decimals)
|Round values
floor()
|Floor
ceil()
|Ceiling
apply(func)
|Apply function
map(mapper)
|Map values
Aggregation Methods
|Method
|Description
sum()
|Sum
mean()
|Mean
avg()
|Alias for mean
min()
|Minimum
max()
|Maximum
count()
|Count non-null
nunique()
|Unique count
std()
|Standard deviation
var()
|Variance
median()
|Median
quantile(q)
|Quantile
first()
|First value
last()
|Last value
any()
|Any true
all()
|All true
LazyGroupBy
Represents a grouped DataStore for aggregation operations.
Methods
|Method
|Returns
|Description
agg(spec)
|DataStore
|Aggregate
aggregate(spec)
|DataStore
|Alias for agg
sum()
|DataStore
|Sum per group
mean()
|DataStore
|Mean per group
count()
|DataStore
|Count per group
min()
|DataStore
|Min per group
max()
|DataStore
|Max per group
std()
|DataStore
|Std dev per group
var()
|DataStore
|Variance per group
median()
|DataStore
|Median per group
nunique()
|DataStore
|Unique count per group
first()
|DataStore
|First value per group
last()
|DataStore
|Last value per group
nth(n)
|DataStore
|Nth value per group
head(n)
|DataStore
|First n per group
tail(n)
|DataStore
|Last n per group
apply(func)
|DataStore
|Apply function per group
transform(func)
|DataStore
|Transform per group
filter(func)
|DataStore
|Filter groups
Column Selection
Aggregation Specifications
LazySeries
Represents a lazy Series (single column).
Properties
|Property
|Type
|Description
name
|str
|Series name
dtype
|dtype
|Data type
Methods
Inherits most methods from
ColumnExpr. Key methods:
|Method
|Description
value_counts()
|Value frequencies
unique()
|Unique values
nunique()
|Count unique
mode()
|Mode value
to_list()
|Convert to list
to_numpy()
|Convert to array
to_frame()
|Convert to DataStore
Related Classes
F (Functions)
Namespace for ClickHouse functions.
See Aggregation for details.
Field
Reference to a column by name.
CaseWhen
Builder for CASE WHEN expressions.
Window
Window specification for window functions.