Flight Delay Prediction using Hybrid Machine Learning Approach: A Case Study of Major Airlines in the United States

Rajesh Kumar Jha,Shashi Bhushan Jha,Vijay Pandey,Radu F. Babiceanu
2024-09-01
Abstract:The aviation industry has experienced constant growth in air traffic since the deregulation of the U.S. airline industry in 1978. As a result, flight delays have become a major concern for airlines and passengers, leading to significant research on factors affecting flight delays such as departure, arrival, and total delays. Flight delays result in increased consumption of limited resources such as fuel, labor, and capital, and are expected to increase in the coming decades. To address the flight delay problem, this research proposes a hybrid approach that combines the feature of deep learning and classic machine learning techniques. In addition, several machine learning algorithms are applied on flight data to validate the results of proposed model. To measure the performance of the model, accuracy, precision, recall, and F1-score are calculated, and ROC and AUC curves are generated. The study also includes an extensive analysis of the flight data and each model to obtain insightful results for U.S. airlines.
Machine Learning
What problem does this paper attempt to address?
The paper attempts to address the problem of predicting flight delays for American airlines. Specifically, the researchers focused on departure delays, arrival delays, and total delays, and proposed a hybrid approach that combines deep learning and classical machine learning techniques to solve these issues. By analyzing a large amount of flight data, the paper validates the effectiveness of the proposed model and evaluates the performance of different models. ### Research Background Since the deregulation of the U.S. aviation industry in 1978, air traffic has continued to grow, making flight delays a major issue for airlines and passengers. Flight delays not only increase the consumption of limited resources such as fuel, labor, and capital but may also lead to further deterioration of delay situations in the coming decades. Therefore, studying the causes and prediction methods of flight delays is of great significance. ### Research Objectives 1. **Classify Flight Delay Problems**: Divide the flight delay problem into three sub-problems: departure delay, arrival delay, and total delay. 2. **Develop a Hybrid Approach**: Propose a new hybrid approach that combines deep learning and classical machine learning techniques to predict flight delays. 3. **Validate Model Performance**: Use various machine learning algorithms to validate the effectiveness of the proposed method and evaluate model performance through metrics such as accuracy, precision, recall, and F1 score. 4. **Data Analysis**: Conduct a detailed analysis of flight data from American airlines to extract key features and insights. ### Main Contributions 1. **Literature Review**: Reviewed existing literature and summarized the current state of flight delay research. 2. **Data Collection**: Collected 27 months of flight data from American airlines, covering multiple factors. 3. **Data Analysis**: Conducted a detailed analysis of the flight data and generated various charts to showcase key insights. 4. **Hybrid Approach**: Developed a new method that combines deep learning and traditional machine learning techniques to predict flight delays for American airlines. ### Methods 1. **Fully Connected Neural Network (FCNN)**: Used to extract high-dimensional feature representations from the data. 2. **Random Forest**: Used for classification tasks to improve the model's generalization ability. 3. **XGBoost**: Optimizes the model's training loss and regularization terms through the gradient boosting tree method. 4. **Hybrid Approach**: Uses the output of FCNN as feature input to Random Forest and XGBoost, combining the advantages of both. ### Experimental Results 1. **Departure Delay**: XGBoost performed best in terms of accuracy, F1 score, and precision, while FCNN + Random Forest performed better in terms of recall. 2. **Arrival Delay**: XGBoost excelled in all metrics, while FCNN + Random Forest performed slightly better in terms of recall. 3. **Total Delay**: FCNN + Random Forest excelled in all metrics, particularly achieving an AUC value of 0.97. ### Conclusion The proposed method in the paper achieved significant results in predicting flight delays, especially in the total delay task. By combining deep learning and classical machine learning techniques, the model can better capture complex patterns in the data, improving prediction accuracy. These results have important practical implications for airlines to optimize operations, reduce delays, and improve customer satisfaction.