Dataset interface

This interface defines a pytorch-like dataset interface for loading, combining and slicing datasets.

class dipm.data.chemical_datasets.dataset.Dataset

Pytorch dataset like base objects.

abstractmethod __getitem__(index: int) ChemicalSystem
abstractmethod __getitem__(index: list | ndarray | slice) list[ChemicalSystem]
abstractmethod __len__() int
release()

Release resources.

class dipm.data.chemical_datasets.dataset.ConcatDataset(datasets: Sequence[Dataset], shuffle: bool = False, parallel: bool = False)

Dataset as a concatenation of multiple datasets. Pytorch-like ConcatDataset.

__init__(datasets: Sequence[Dataset], shuffle: bool = False, parallel: bool = False)

Create a concatenated dataset.

Parameters:
  • datasets (sequence) – List of datasets to concatenate.

  • shuffle (bool) – Whether to shuffle all datasets together.

  • parallel (bool) – Whether to load data in parallel. If True, every dataset will be loaded in a separate process when __getitem__ is called.

__getitem__(index)
__len__() int
release()

Release parallel loading resources and dataset file handles.

class dipm.data.chemical_datasets.dataset.Subset(dataset: ConcatDataset, start: int, length: int)
class dipm.data.chemical_datasets.dataset.Subset(dataset: Dataset, start: int, length: int)

Subset of a dataset with a given slice.

If the dataset is a ConcatDataset, Subset will act on each sub-dataset of ConcatDataset and return a new ConcatDataset with all its sub-datasets sliced. This is to enable parallel loading using ConcatDataset, and its effect is completely equivalent to direct Subset.

__init__(dataset: Dataset, start: int, length: int)
__getitem__(index)
__len__()
release()

Release dataset resources.