This study investigates the internal representations used in machine learning potentials or deep learning models, focusing on the high-dimensional feature vectors, also known as descriptors, that encode atomic structures. These descriptors form an inner space that appears to possess an intrinsic structure, which we aim to exploit for the characterization of complex energy landscapes. This structured space provides a powerful framework for studying the interactions and transformations within networks of crystal defects, which give rise to a remarkably diverse range of defect morphologies [1].
Rather than spending effort on developing yet another machine learning potential, an already mature and widely explored topic, we will instead concentrate on inspecting and exploiting the descriptor space itself. By analyzing this descriptor space, we can: (i) identify and statistically characterize complex defect networks [1,2]; (ii) combine these insights with accelerated Molecular Dynamics methods, such as the Bayesian Adaptive Biasing Force approach [3], to efficiently sample intricate defect energy landscapes; (iii) construct surrogate models that bypass traditional methodologies to predict demanding properties, such as vibrational entropies, with significantly reduced computational effort [4]; and (iv) demonstrate that descriptors can encode an agnostic entropy, thereby establishing a link between Gibbs and Shannon entropy and enabling a natural generalization of anharmonic free energies across a wide range of relevant physical systems [5]. For the broad class of generalized linear models we show free energies can be cast as the Legendre transform of a high-dimensional descriptor entropy, accurately estimated via score matching.
This last concept provides the first example in the literature where the free energy itself can be backpropagated. We present a model agnostic estimator which returns meV/atom accurate, end-to-end differentiable free energies over a diverse, multi-element range of parameters.
[1] A. M. Goryaeva et al. Nature Commun. 14, 3003 (2023); A. M. Goryaeva et al. Nature Commun. 11, 4691 (2020); P. Lafourcade et al. Comp. Mater. Sci. 230, 112534 (2025)
[2] M.-C. Marinica, A. M. Goryaeva, T. D. Swinburne et al, MiLaDy - Machine Learning Dynamics, CEA Saclay, 2015-2025: https://ai-atoms.github.io/milady/ ;
[3] A. Zhong et al, Phys. Rev. Mater. 7, 023802 (2023); A. Zhong et al. PRX Energy 4, 013008 (2025); C. Lapointe et al. Phys. Rev. Mater. 9, 093801 (2025) ; A. Allera et al Nature Commun. 16, 8367 (2025).
[4] C. Lapointe, et al, Phys. Rev. Mater. 6, 113803 (2022).
[5] T. D. Swinburne et al arXiv:2502.18191 (2025).