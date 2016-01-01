ParquetMetadata
Description
Special format for reading Parquet file metadata (https://parquet.apache.org/docs/file-format/metadata/). It always outputs one row with the next structure/content:
num_columns- the number of columns
- ``num_rows` - the total number of rows
num_row_groups- the total number of row groups
format_version- parquet format version, always 1.0 or 2.6
total_uncompressed_size- total uncompressed bytes size of the data, calculated as the sum of total_byte_size from all row groups
total_compressed_size- total compressed bytes size of the data, calculated as the sum of total_compressed_size from all row groups
columns- the list of columns metadata with the next structure:
name- column name
path- column path (differs from name for nested column)
max_definition_level- maximum definition level
max_repetition_level- maximum repetition level
physical_type- column physical type
logical_type- column logical type
compression- compression used for this column
total_uncompressed_size- total uncompressed bytes size of the column, calculated as the sum of total_uncompressed_size of the column from all row groups
total_compressed_size- total compressed bytes size of the column, calculated as the sum of total_compressed_size of the column from all row groups
space_saved- percent of space saved by compression, calculated as (1 - total_compressed_size/total_uncompressed_size).
encodings- the list of encodings used for this column
row_groups- the list of row groups metadata with the next structure:
num_columns- the number of columns in the row group
num_rows- the number of rows in the row group
total_uncompressed_size- total uncompressed bytes size of the row group
total_compressed_size- total compressed bytes size of the row group
columns- the list of column chunks metadata with the next structure:
name- column name
path- column path
total_compressed_size- total compressed bytes size of the column
total_uncompressed_size- total uncompressed bytes size of the row group
have_statistics- boolean flag that indicates if column chunk metadata contains column statistics
statistics- column chunk statistics (all fields are NULL if have_statistics = false) with the next structure:
num_values- the number of non-null values in the column chunk
null_count- the number of NULL values in the column chunk
distinct_count- the number of distinct values in the column chunk
min- the minimum value of the column chunk
max- the maximum column of the column chunk
Example Usage
Example: