A Machine Learning Tutorial for Operational Meteorology, Part I: Traditional Machine Learning

Randy J. Chase,David R. Harrison,Amanda Burke,Gary M. Lackmann,Amy McGovern
DOI: https://doi.org/10.1175/WAF-D-22-0070.1
2022-06-07
Abstract:Recently, the use of machine learning in meteorology has increased greatly. While many machine learning methods are not new, university classes on machine learning are largely unavailable to meteorology students and are not required to become a meteorologist. The lack of formal instruction has contributed to perception that machine learning methods are 'black boxes' and thus end-users are hesitant to apply the machine learning methods in their every day workflow. To reduce the opaqueness of machine learning methods and lower hesitancy towards machine learning in meteorology, this paper provides a survey of some of the most common machine learning methods. A familiar meteorological example is used to contextualize the machine learning methods while also discussing machine learning topics using plain language. The following machine learning methods are demonstrated: linear regression; logistic regression; decision trees; random forest; gradient boosted decision trees; naive Bayes; and support vector machines. Beyond discussing the different methods, the paper also contains discussions on the general machine learning process as well as best practices to enable readers to apply machine learning to their own datasets. Furthermore, all code (in the form of Jupyter notebooks and Google Colaboratory notebooks) used to make the examples in the paper is provided in an effort to catalyse the use of machine learning in meteorology.
Atmospheric and Oceanic Physics,Machine Learning
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to reduce the opacity of machine - learning methods in meteorology and lessen meteorologists' hesitation to use machine - learning techniques. Specifically, the goals of the paper include: 1. **Provide an overview of machine - learning methods**: By introducing some of the most commonly used machine - learning methods, such as linear regression, logistic regression, decision trees, random forests, gradient - boosted decision trees, naive Bayes, and support vector machines, to help meteorologists understand and apply these methods. 2. **Enhance trust**: By using approachable language and meteorological examples to discuss machine - learning methods, increase meteorologists' trust in machine learning. The paper emphasizes the importance of non - technical explanations in enhancing users' trust in machine - learning methods. 3. **Provide practical guidelines**: Not only discuss different machine - learning methods, but also cover general machine - learning processes and best practices, enabling readers to apply these methods to their own data sets. In addition, the paper provides all example codes (in the form of Jupyter notebooks and Google Colaboratory notebooks) to promote the application of machine learning in meteorology. 4. **Solve the "black - box" problem**: In response to the problem that machine - learning models are often regarded as "black - boxes", the paper aims to make these methods more transparent through detailed explanations and examples, thereby enhancing the consistency and credibility of users' machine - learning results. Through these goals, the paper hopes to provide meteorologists with a comprehensive reference, enabling them to incorporate machine - learning techniques into their daily work more confidently.