EquiformerV2¶

class dipm.models.equiformer_v2.models.EquiformerV2(*args: Any, **kwargs: Any)¶

The EquiformerV2 model flax module. It is derived from the ForceModel class.

References

Yi-Lun Liao, Brandon Wood, Abhishek Das and Tess Smidt. EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations. International Conference on Learning Representations (ICLR), January 2024. URL: https://openreview.net/forum?id=mCOBKZmrzD.

config¶

Hyperparameters / configuration for the EquiformerV2 model, see EquiformerV2Config.

Type:: dipm.models.equiformer_v2.config.EquiformerV2Config

dataset_info¶: Hyperparameters dictated by the dataset (e.g., cutoff radius or average number of neighbors).

__call__(edge_vectors: Array, node_species: Array, senders: Array, receivers: Array, n_node: Array, rngs: Rngs | None = None) → Array¶: Compute node-wise energy summands. This function must be overridden by the implementation of ForceModel.

class dipm.models.equiformer_v2.config.EquiformerV2Config(*, force_head: bool = False, param_dtype: DtypeEnum = DtypeEnum.F32, num_layers: Annotated[int, Gt(gt=0)] = 12, lmax: Annotated[int, Gt(gt=0)] = 6, mmax: Annotated[int, Ge(ge=0)] = 2, sphere_channels: Annotated[int, Gt(gt=0)] = 128, num_edge_channels: Annotated[int, Gt(gt=0)] = 128, atom_edge_embedding: str = 'isolated', num_rbf: Annotated[int, Gt(gt=0)] = 600, attn_hidden_channels: Annotated[int, Gt(gt=0)] = 64, num_heads: Annotated[int, Gt(gt=0)] = 8, attn_alpha_channels: Annotated[int, Gt(gt=0)] = 64, attn_value_channels: Annotated[int, Gt(gt=0)] = 16, ffn_hidden_channels: Annotated[int, Gt(gt=0)] = 128, norm_type: LayerNormType = LayerNormType.LAYER_NORM_SH, grid_resolution: Annotated[int, Gt(gt=0)] = None, use_m_share_rad: bool = False, use_attn_renorm: bool = True, use_gate_act: bool = False, use_grid_mlp: bool = True, use_sep_s2_act: bool = True, alpha_drop: float = 0.1, drop_path_rate: float = 0.05, avg_num_neighbors: float | None = 23.395238876342773, avg_num_nodes: float | None = 77.81317, atomic_energies: str | dict[int, float] | None = None)¶

The configuration / hyperparameters of the EquiformerV2 model.

num_layers¶

Number of EquiformerV2 layers. Default is 12.

Type:: int

lmax¶

Maximum degree of the spherical harmonics (1 to 10).

Type:: int

mmax¶

Maximum order of the spherical harmonics (0 to lmax).

Type:: int

sphere_channels¶

Number of spherical channels. Default is 128.

Type:: int

num_edge_channels¶

Number of channels for the edge invariant features. Default is 128.

Type:: int

atom_edge_embedding¶

Whether to use / share atomic embedding along with relative distance. Options are “none”, “isolated” (default) and “shared”.

Type:: str

num_rbf¶

Number of basis functions used in the embedding block. Default is 600.

Type:: int

attn_hidden_channels¶

Number of hidden channels used during SO(2) graph attention. Use 64 or 96 (not necessarily).

Type:: int

num_heads¶

Number of heads in the attention block. Default is 8.

Type:: int

attn_alpha_channels¶

Number of channels for alpha vector in each attention head.

Type:: int

attn_value_channels¶

Number of channels for value vector in each attention head.

Type:: int

ffn_hidden_channels¶

Number of hidden channels used during feedforward network.

Type:: int

norm_type¶

Type of normalization layer. Options are “layer_norm”, “layer_norm_sh” (default) and “rms_norm_sh”.

Type:: dipm.layers.escn.layernorm.LayerNormType

grid_resolution¶

Resolution of SO3Grid used in Activation. Examples are 18, 16, 14, None (default, decided automatically).

Type:: int

use_m_share_rad¶

Whether all m components within a type-L vector of one channel share radial function weights.

Type:: bool

use_attn_renorm¶

Whether to re-normalize attention weights.

Type:: bool

use_gate_act¶

If True, use gate activation. Otherwise, use S2 activation.

Type:: bool

use_grid_mlp¶

If True, use projecting to grids and performing MLPs for FFNs.

Type:: bool

use_sep_s2_act¶

If True, use separable S2 activation when use_gate_act is False.

Type:: bool

alpha_drop¶

Dropout rate for attention weights. Use 0.0 or 0.1 (default).

Type:: float

drop_path_rate¶

Graph drop path rate. Use 0.0 or 0.05 (default).

Type:: float

avg_num_nodes¶

The mean number of atoms per graph. If None, use the value from the dataset info. Default is value from IS2RE (100k).

Type:: float | None

avg_num_neighbors¶

The mean number of neighbors for atoms. If None, use the value from the dataset info. Default is value from IS2RE (100k). It is used to rescale messages by this value.

Type:: float | None

atomic_energies¶

How to treat the atomic energies. If set to None (default) or the string "average", then the average atomic energies stored in the dataset info are used. It can also be set to the string "zero" which means not to use any atomic energies in the model. Lastly, one can also pass an atomic energies dictionary via this parameter different from the one in the dataset info, that is used.

Type:: str | dict[int, float] | None