Dataset interface¶
This interface defines a pytorch-like dataset interface for loading, combining and slicing datasets.
- class dipm.data.chemical_datasets.dataset.Dataset¶
Pytorch dataset like base objects.
- abstractmethod __getitem__(index: int) ChemicalSystem¶
- abstractmethod __getitem__(index: list | ndarray | slice) list[ChemicalSystem]
- abstractmethod __len__() int¶
- release()¶
Release resources.
- class dipm.data.chemical_datasets.dataset.ConcatDataset(datasets: Sequence[Dataset], shuffle: bool = False, parallel: bool = False)¶
Dataset as a concatenation of multiple datasets. Pytorch-like ConcatDataset.
- __init__(datasets: Sequence[Dataset], shuffle: bool = False, parallel: bool = False)¶
Create a concatenated dataset.
- Parameters:
datasets (sequence) – List of datasets to concatenate.
shuffle (bool) – Whether to shuffle all datasets together.
parallel (bool) – Whether to load data in parallel. If True, every dataset will be loaded in a separate process when
__getitem__is called.
- __getitem__(index)¶
- __len__() int¶
- release()¶
Release parallel loading resources and dataset file handles.
- class dipm.data.chemical_datasets.dataset.Subset(dataset: ConcatDataset, start: int, length: int)¶
- class dipm.data.chemical_datasets.dataset.Subset(dataset: Dataset, start: int, length: int)
Subset of a dataset with a given slice.
If the dataset is a ConcatDataset, Subset will act on each sub-dataset of ConcatDataset and return a new ConcatDataset with all its sub-datasets sliced. This is to enable parallel loading using ConcatDataset, and its effect is completely equivalent to direct Subset.
- __getitem__(index)¶
- __len__()¶
- release()¶
Release dataset resources.