Relevant, hidden, and frustrated information in high-dimensional analyses of complex dynamical systems with internal noise

Chiara Lionello,Matteo Becchi,Simone Martino,Giovanni M. Pavan
2024-12-13
Abstract:Extracting meaningful insights from trajectory data to understand complex systems is challenging. High-dimensional analyses are often assumed essential to avoid losing information, but the necessity of such high-dimensionality is unclear. Here, we address this fundamental issue using an atomistic molecular dynamics example of liquid water and ice coexisting at the solid/liquid transition temperature. We analyze molecular trajectories with the high-dimensional Smooth Overlap of Atomic Positions (SOAP) descriptor, generating 2.56e6 576-dimensional SOAP vectors. Surprisingly, our results show that a single SOAP dimension, accounting for >0.001% of the total variance, is more descriptive than the full dataset. Including additional dimensions degrades analysis quality due to "noise-frustrated information" effects, where noise outweighs relevant information. These effects are shown to be general across systems, scales, and dimensionality reduction methods. Our findings challenge the notion that high-dimensional analyses are inherently superior, emphasizing the importance of prioritizing information quality over quantity in complex datasets.
Chemical Physics
What problem does this paper attempt to address?