Dataset Info¶
- class dipm.data.dataset_info.DatasetInfo(*, cutoff_distance_angstrom: float, max_neighbors_per_atom: int | None = None, task_list: list[str] | None = None, atomic_energies_map: dict[int, float | list[float]], avg_num_neighbors: float = 1.0, avg_num_nodes: float = 1.0, avg_r_min_angstrom: float | None = None, scaling_mean: float = 0.0, scaling_stdev: float = 1.0, median_num_neighbors: int = 1, max_total_edges: int = 1, median_num_nodes: int = 1, max_num_nodes: int = 1)¶
Pydantic dataclass holding information computed from the dataset that is (potentially) required by the models. There are three types of fields:
User specified fields: These fields are specified by the user but cannot be changed when fine-tuning.
Model related computed fields: These fields are computed from the dataset but are bound to the model and cannot be changed when fine-tuning.
Dataset related computed fields: These fields are computed from the dataset and can / are recommended to be changed when fine-tuning.
- cutoff_distance_angstrom¶
The graph cutoff distance that was used in the dataset in Angstrom.
- Type:
float
- max_neighbors_per_atom¶
The maximum number of neighbors to consider for each atom. Do NOT use it typically, as it will broke the smoothness.
- Type:
int | None
- task_list¶
List of different tasks/datasets used in training.
None(default) means no task embedding used / only one task. If provided, values of the atomic energies map must be lists of floats, one for each task.- Type:
list[str] | None
- atomic_energies_map¶
A dictionary mapping the atomic numbers to the computed average atomic energies for that element.
- Type:
dict[int, float | list[float]]
- avg_num_neighbors¶
The mean number of neighbors an atom has in the dataset.
- Type:
float
- avg_num_nodes¶
The mean number of nodes per graph in the dataset.
- Type:
float
- avg_r_min_angstrom¶
The mean minimum edge distance for a structure in the dataset.
- Type:
float | None
- scaling_mean¶
The mean used for the rescaling of the dataset values, the default being 0.0.
- Type:
float
- scaling_stdev¶
The standard deviation used for the rescaling of the dataset values, the default being 1.0.
- Type:
float
- median_num_neighbors¶
The median number of neighbors an atom has in the dataset.
- Type:
int
- max_total_edges¶
The maximum number of edges in the dataset.
- Type:
int
- median_num_nodes¶
The median number of nodes per graph in the dataset.
- Type:
int
- max_num_nodes¶
The maximum number of nodes per graph in the dataset.
- Type:
int
- __init__(**data: Any) None¶
Create a new model by parsing and validating input data from keyword arguments.
Raises [
ValidationError][pydantic_core.ValidationError] if the input data cannot be validated to form a valid model.selfis explicitly positional-only to allowselfas a field name.