Abstract:Background: Although computational models are advancing air quality prediction, achieving the desired performance or accuracy of prediction remains a gap, which impacts the implementation of machine learning (ML) air quality prediction models. Several models have been employed and some hybridized to enhance air quality and air quality index predictions. The objective of this paper is to systematically review machine and deep learning techniques for spatiotemporal air prediction challenges. Methods: In this review, a methodological framework based on PRISMA flow was utilized in which the initial search terms were defined to guide the literature search strategy in online data sources (Scopus and Google Scholar). The inclusion criteria are articles published in the English language, document type (articles and conference papers), and source type (journal and conference proceedings). The exclusion criteria are book series and books. The authors' search strategy was complemented with ChatGPT-generated keywords to reduce the risk of bias. Report synthesis was achieved by keyword grouping using Microsoft Excel, leading to keyword sorting in ascending order for easy identification of similar and dissimilar keywords. Three independent researchers were used in this research to avoid bias in data collection and synthesis. Articles were retrieved on 27 July 2024. Results: Out of 374 articles, 80 were selected as they were in line with the scope of the study. The review identified the combination of a machine learning technique and deep learning techniques for data limitations and processing of the nonlinear characteristics of air pollutants. ML models, such as random forest, and decision tree classifier were among the commonly used models for air quality index and air quality predictions, with promising performance results. Deep learning models are promising due to the hyper-parameter components, which consist of activation functions suitable for nonlinear spatiotemporal data. The emergence of low-cost devices for data limitations is highlighted, in addition to the use of transfer learning and federated learning models. Again, it is highlighted that military activities and fires impact the O3 concentration, and the best-performing models highlighted in this review could be helpful in developing predictive models for air quality prediction in areas with heavy military activities. Limitation: This review acknowledges methodological challenges in terms of data collection sources, as there are equally relevant materials on other online data sources. Again, the choice and use of keywords for the initial search and the creation of subsequent filter keywords limit the collection of other relevant research articles.

A review of machine learning for modeling air quality: Overlooked but important issues

Machine Learning for Urban Air Quality Analytics: A Survey

Systematic Review of Machine Learning and Deep Learning Techniques for Spatiotemporal Air Quality Prediction

The Development and Application of Machine Learning in Atmospheric Environment Studies

Data-Driven Machine Learning in Environmental Pollution: Gains and Problems

Air quality and urban sustainable development: the application of machine learning tools

Spatial prediction of soil contamination based on machine learning: a review

Improving machine-learned surface NO 2 concentration mapping models with domain knowledge from data science perspective

Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Review of Recent Advances in Remote Sensing and Machine Learning Methods for Lake Water Quality Management

Intelligent modeling strategies for forecasting air quality time series: A review

Supervised Machine Learning Approaches for Predicting Key Pollutants and for the Sustainable Enhancement of Urban Air Quality: A Systematic Review

A review of statistical methods used for developing large-scale and long-term PM2.5 models from satellite data

Machine Learning in Environmental Research: Common Pitfalls and Best Practices.

Data imbalance causes underestimation of high ozone pollution in machine learning models: a weighted support vector regression solution

Extracting Regional and Temporal Features to Improve Machine Learning for Hourly Air Pollutants in Urban India

Machine learning and remote sensing integration for leveraging urban sustainability: A review and framework

Long-term Evaluation of Machine Learning Based Methods for Air Emission Monitoring

Air Quality Forecasting Using Machine Learning: A Global perspective with Relevance to Low-Resource Settings

Ozone response modeling to NOx and VOC emissions: Examining machine learning models