parquet_dataset
Parquet datasets, to be used with raw data under the ".parquet" format.
BaseParquetDataset
#
Base Parquet Dataset class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
split_name |
str
|
A split name, for example, "train", only used for the visualization XpViz. |
required |
identifier_name |
str
|
A key to group each dataset, only used in the visualization platform XpViz. |
required |
path |
str
|
The directory artifact path. |
required |
storage_options |
Optional storage options to stream data from a cloud storage instance. |
required |
ParquetDataset
#
Parquet Dataset class.
analyze(*forced_type: Feature, target_names: list[str] | None = None) -> AnalyzedParquetDataset
#
Analyze the dataset and create an Analyzed Schema.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
forced_type |
Feature
|
Features objects to force custom feature type for specific column names in the Arrow Table. |
()
|
target_names |
list[str] | None
|
Optional list of column names indicating which columns should be considered targets. Default None. |
None
|
Returns:
Type | Description |
---|---|
AnalyzedParquetDataset
|
The analyzed dataset, a parquet dataset with an analyzed schema attached. |
Source code in src/xpdeep/dataset/parquet_dataset.py
AnalyzedParquetDataset(parquet_dataset: ParquetDataset, analyzed_schema: AnalyzedSchema)
#
Analyzed Parquet Dataset class.
Source code in src/xpdeep/dataset/parquet_dataset.py
analyzed_schema = analyzed_schema
#
fit() -> FittedParquetDataset
#
Create a Fitted Parquet Dataset object.
Source code in src/xpdeep/dataset/parquet_dataset.py
FittedParquetDataset
#
Fitted Parquet Dataset class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fitted_schema |
FittedSchema
|
|
required |