Abstract:With recent advances in artificial intelligence, machine learning (ML) approaches have become an attractive tool in petroleum engineering, particularly for reservoir characterizations. A key reservoir property is hydrocarbon recovery factor (RF) whose accurate estimation would provide decisive insights to drilling and production strategies. Therefore, this study aims to estimate the hydrocarbon RF for exploration from various reservoir characteristics, such as porosity, permeability, pressure, and water saturation via the ML. We applied three regression-based models including the extreme gradient boosting (XGBoost), support vector machine (SVM), and stepwise multiple linear regression (MLR) and various combinations of three databases to construct ML models and estimate the oil and/or gas RF. Using two databases and the cross-validation method, we evaluated the performance of the ML models. In each iteration 90 and 10% of the data were respectively used to train and test the models. The third independent database was then used to further assess the constructed models. For both oil and gas RFs, we found that the XGBoost model estimated the RF for the train and test datasets more accurately than the SVM and MLR models. However, the performance of all the models were unsatisfactory for the independent databases. Results demonstrated that the ML algorithms were highly dependent and sensitive to the databases based on which they were trained. Statistical tests revealed that such unsatisfactory performances were because the distributions of input features and target variables in the train datasets were significantly different from those in the independent databases (p-value < 0.05).

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to accurately estimate the Recovery Factor (RF) of oil and gas reservoirs through machine - learning methods and evaluate the impact of different databases on the accuracy of the model. Specifically, the research aims to: 1. **Apply the eXtreme Gradient Boosting (XGBoost) algorithm**: Develop a machine - learning - based model to estimate the Recovery Factor of oil and gas reservoirs. 2. **Compare the performance of different machine - learning algorithms**: Compare XGBoost with Multiple Linear Regression (MLR) and Support Vector Machine (SVM) to evaluate their performance in estimating oil and gas RF. 3. **Explore the database - dependence issue**: Analyze the impact of different database combinations on the accuracy of machine - learning models and evaluate the reliability and uncertainty of the models on independent databases. ### Research Background Traditional methods for estimating the Recovery Factor of oil and gas reservoirs, such as history matching and volume reserve estimation, have relatively large uncertainties and are time - consuming. With the development of artificial intelligence and data analysis technologies, machine - learning methods provide a new approach for estimating the Recovery Factor of oil and gas reservoirs and can more efficiently use data in the early stage for prediction. ### Main Objectives - **Develop an XGBoost model**: For estimating the Recovery Factor of oil and gas reservoirs. - **Performance comparison**: Train models with multiple database combinations and compare the performance of XGBoost, MLR, and SVM. - **Evaluate database - dependence**: Use independent databases to further evaluate the accuracy and reliability of the models and reveal the impact of different databases on model performance. ### Key Findings - The XGBoost model performs better than the SVM and MLR models on both the training set and the test set. - The performance of the model is highly dependent on the database used for training, and the differences in feature distributions among different databases significantly affect the generalization ability of the model. - Statistical tests show that there are significant differences in the distribution of input features and target variables between the training data and the independent database (p - value < 0.05), resulting in poor performance of the model on the independent database. Through these studies, the author hopes to provide a more efficient and accurate method for estimating the Recovery Factor of oil and gas reservoirs and reveal the important impact of database selection on model performance.

Estimating oil and gas recovery factors via machine learning: Database-dependent accuracy and reliability

Estimating hydrocarbon recovery factor at reservoir scale via machine learning: Database-dependent accuracy and reliability

Estimating oil recovery factor using machine learning: Applications of XGBoost classification

Machine learning for recovery factor estimation of an oil reservoir: A tool for de-risking at a hydrocarbon asset evaluation

Estimating Oil Recovery Efficiency of Carbonated Water Injection with Supervised Machine Learning Paradigms and Implications for Uncertainty Analysis

Machine learning approaches for estimating interfacial tension between oil/gas and oil/water systems: a performance analysis

Machine Learning-Based Research for Predicting Shale Gas Well Production

A systematic machine learning method for reservoir identification and production prediction

Development of Oil Fields Using Science Artificial Intelligence and Machine Learning

Analysis of Machine Learning Models for Prediction of Petrophysical Data

Prediction of Single-Well Production Rate after Hydraulic Fracturing in Unconventional Gas Reservoirs Based on Ensemble Learning Model

Prediction of Formation Permeability While Drilling: Machine Learning Applications

Support Vector Regression Based on the Particle Swarm Optimization Algorithm for Tight Oil Recovery Prediction

Machine Learning to Improve Natural Gas Reservoir Simulations

A Novel Hybrid ANN-GB-LR Model for Predicting Oil and Gas Production Rate

A Comparison of Machine Learning Approaches for Prediction of Permeability using Well Log Data in the Hydrocarbon Reservoirs

Permeability prediction of petroleum reservoirs using stochastic gradient boosting regression

Machine Learning in Oil and Gas Exploration: A Review

Analysis of Factors Influencing Recovery of Low Permeability and Strong Heterogeneous Gas Reservoirs and Establishment of Prediction Model

Advanced machine learning approaches for predicting permeability in reservoir pay zones based on core analyses

Performance evaluation of ferro-fluids flooding in enhanced oil recovery operations based on machine learning