Dataset interface¶

This interface defines a pytorch-like dataset interface for loading, combining and slicing datasets.

class dipm.data.chemical_datasets.dataset.Dataset¶

Pytorch dataset like base objects.

abstractmethod __getitem__(index: int) → ChemicalSystem¶
abstractmethod __getitem__(index: list | ndarray | slice) → list[ChemicalSystem]

abstractmethod __len__() → int¶

release()¶: Release resources.

class dipm.data.chemical_datasets.dataset.ConcatDataset(datasets: Sequence[Dataset], shuffle: bool = False, parallel: bool = False)¶

Dataset as a concatenation of multiple datasets. Pytorch-like ConcatDataset.

__init__(datasets: Sequence[Dataset], shuffle: bool = False, parallel: bool = False)¶

Create a concatenated dataset.

Parameters:

datasets (sequence) – List of datasets to concatenate.
shuffle (bool) – Whether to shuffle all datasets together.
parallel (bool) – Whether to load data in parallel. If True, every dataset will be loaded in a separate process when __getitem__ is called.

__getitem__(index)¶

__len__() → int¶

release()¶: Release parallel loading resources and dataset file handles.

class dipm.data.chemical_datasets.dataset.Subset(dataset: ConcatDataset, start: int, length: int)¶

class dipm.data.chemical_datasets.dataset.Subset(dataset: Dataset, start: int, length: int)

Subset of a dataset with a given slice.

If the dataset is a ConcatDataset, Subset will act on each sub-dataset of ConcatDataset and return a new ConcatDataset with all its sub-datasets sliced. This is to enable parallel loading using ConcatDataset, and its effect is completely equivalent to direct Subset.

__init__(dataset: Dataset, start: int, length: int)¶

__getitem__(index)¶

__len__()¶

release()¶: Release dataset resources.