A survey on Big Data and Machine Learning for Chemistry

Jose F Rodrigues Jr,Larisa Florea,Maria C F de Oliveira,Dermot Diamond,Osvaldo N Oliveira Jr
DOI: https://doi.org/10.48550/arXiv.1904.10370
2019-04-23
Abstract:Herein we review aspects of leading-edge research and innovation in chemistry which exploits big data and machine learning (ML), two computer science fields that combine to yield machine intelligence. ML can accelerate the solution of intricate chemical problems and even solve problems that otherwise would not be tractable. But the potential benefits of ML come at the cost of big data production; that is, the algorithms, in order to learn, demand large volumes of data of various natures and from different sources, from materials properties to sensor data. In the survey, we propose a roadmap for future developments, with emphasis on materials discovery and chemical sensing, and within the context of the Internet of Things (IoT), both prominent research fields for ML in the context of big data. In addition to providing an overview of recent advances, we elaborate upon the conceptual and practical limitations of big data and ML applied to chemistry, outlining processes, discussing pitfalls, and reviewing cases of success and failure.
Chemical Physics,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the application and challenges of big data and machine learning in the field of chemistry. Specifically, the paper focuses on the following aspects: 1. **New trends of big data and machine learning in chemistry**: The paper explores how to use big data and machine learning to accelerate scientific research in the field of chemistry, especially their applications in material discovery and chemical sensing. These techniques can process large amounts of data and extract valuable information from them, thus promoting scientific discovery. 2. **Material discovery**: The paper discusses the applications of big data and machine learning in material discovery, including the establishment of large - scale databases, the use of genetic algorithms to identify compounds, the prediction of synthesis processes through machine learning, quantum - chemical calculations, and computer - aided drug design. These methods are helpful for quickly screening and discovering new materials with specific properties. 3. **Chemical sensing**: The paper also explores the applications of big data and machine learning in chemical sensors and biosensors, especially how to improve the performance of sensors through these techniques to support the development of the Internet of Things (IoT). These sensors can collect and analyze chemicals in the environment in real - time and provide important data support. 4. **Current limitations and future prospects**: The paper not only summarizes the latest progress of big data and machine learning in the field of chemistry, but also discusses in detail the conceptual and practical limitations faced by these techniques, such as data quality, interpretability of algorithms, etc., and proposes future research directions and development suggestions. In summary, this paper aims to comprehensively review and evaluate the current application status of big data and machine learning in the field of chemistry, identify existing challenges, and propose future development paths.