A Quantitative Analysis on the Use of Supervised Machine Learning in Earth Science

Hassan Muhammad,K. Virts,Ashish Acharya,G. Priftis,Kumar Ankur,Muthukumaran Ramasubramanian,Ashlyn Shirey,R. Ramachandran
DOI: https://doi.org/10.1109/IGARSS39084.2020.9323770
2020-09-26
Abstract:Recent review papers have discussed the opportunities and challenges of applying machine learning (ML) techniques to Earth science data. A common challenge cited in these papers is the lack of labeled training data. A literature review of Earth science papers over the last 10 years demonstrates that while there is rapid adoption of ML, particularly in biogeoscience and land surface research, the training datasets typically contain only hundreds of samples. This lack of training data limits the use of deep learning algorithms, which require larger volumes of labeled data. In situ training data are most frequently used in almost all domains, followed by model output and satellite data. The atmosphere and solid Earth domains use the largest training datasets, an order of magnitude larger than in biogeoscience papers. Random forest is the most commonly applied ML algorithm in all domains except atmospheric science and biogeoscience, which more frequently use fully connected neural networks.
Computer Science,Geology,Environmental Science
What problem does this paper attempt to address?