What is it about?
Active learning is a widely used technique to build data sets, but an optimal strategy to find highly uncertain points does not exist. We use enhanced sampling to drive the molecular system towards unchartered territories in configuration space. The "reaction coordinate" is the data-based uncertainty, which avoids the need for any chemical intuition or prior knowledge about the system.
Featured Image
Why is it important?
ML models are usually good at interpolation but poor at extrapolation. Therefore, it is important to have training sets that cover all important parts of configuration space. To achieve this, our method drives the system during active learning towards highly uncertain regions for data collection, so that one can build very robust data sets.
Read the Original
This page is a summary of: Enhanced sampling of robust molecular datasets with uncertainty-based collective variables, The Journal of Chemical Physics, January 2025, American Institute of Physics,
DOI: 10.1063/5.0246178.
You can read the full text:
Contributors
The following have contributed to this page