Data-Driven Classification and Logging Prediction of Mudrock Lithofacies Using Machine Learning: Shale Oil Reservoirs in the Eocene Shahejie Formation, Bonan Sag, Bohai Bay Basin, Eastern China

Qiuhong Chang,Zhuang Ruan,Bingsong Yu,Chenyang Bai,Yanli Fu,Gaofeng Hou
DOI: https://doi.org/10.3390/min14040370
IF: 2.5
2024-04-01
Minerals
Abstract:As the world's energy demand continues to expand, shale oil has a substantial influence on the global energy reserves. The third submember of the Mbr 3 of the Shahejie Fm, characterized by complicated mudrock lithofacies, is one of the significant shale oil enrichment intervals of the Bohai Bay Basin. The classification and identification of lithofacies are key to shale oil exploration and development. However, the efficiency and reliability of lithofacies identification results can be compromised by qualitative classification resulting from an incomplete workflow. To address this issue, a comprehensive technical workflow for mudrock lithofacies classification and logging prediction was designed based on machine learning. Principal component analysis (PCA) and hierarchical cluster analysis (HCA) were conducted to realize the automatic classification of lithofacies, which can classify according to the internal relationship of the data without the disturbance of human factors and provide an accurate lithofacies result in a much shorter time. The PCA and HCA results showed that the third submember can be split into five lithofacies: massive argillaceous limestone lithofacies (MAL), laminated calcareous claystone lithofacies (LCC), intermittent lamellar argillaceous limestone lithofacies (ILAL), continuous lamellar argillaceous limestone lithofacies (CLAL), and laminated mixed shale lithofacies (LMS). Then, random forest (RF) was performed to establish the identification model for each of the lithofacies and the obtained model is optimized by grid search (GS) and K-fold cross validation (KCV), which could then be used to predict the lithofacies of the non-coring section, and the three validation methods showed that the accuracy of the GS–KCV–RF model were all above 93%. It is possible to further enhance the performance of the models by resampling, incorporating domain knowledge, and utilizing the mechanism of attention. Our method solves the problems of the subjective and time-consuming manual interpretation of lithofacies classification and the insufficient generalization ability of machine-learning methods in the previous works on lithofacies prediction research, and the accuracy of the model for mudrocks lithofacies prediction is also greatly improved. The lithofacies machine-learning workflow introduced in this study has the potential to be applied in the Bohai Bay Basin and comparable reservoirs to enhance exploration efficiency and reduce economic costs.
geochemistry & geophysics,mineralogy,mining & mineral processing
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to improve the efficiency and accuracy of mudstone sequence classification and logging prediction, especially in the shale oil reservoirs of the third member of the Shahejie Formation in the Bo'an Depression in eastern Bohai Bay Basin, China. Specifically, the paper focuses on how to achieve automatic classification of mudstone sequences and logging prediction through machine learning methods, especially principal component analysis (PCA), hierarchical cluster analysis (HCA) and optimized random forest (RF) algorithms. The solution of these problems is helpful for shale oil exploration and development, reducing the subjectivity and time - consuming of manual interpretation, and at the same time improving the generalization ability and prediction accuracy of the model. ### Main research contents: 1. **Background introduction**: - The importance of shale oil in global energy reserves is increasing day by day. - The mudstone sequence of the third member of the Shahejie Formation is complex and is an important shale - oil - rich interval in the Bohai Bay Basin. - The classification and identification of mudstone sequences are crucial for shale oil exploration and development, but the traditional manual classification methods have the problems of low efficiency and poor reliability. 2. **Research methods**: - **Data collection**: Geological parameters are obtained through means such as drilling core observation, thin - section making, and X - ray diffraction (XRD) analysis. - **Principal component analysis (PCA)**: Used for dimension reduction to extract the principal components reflecting the characteristics of mudstone sequences. - **Hierarchical cluster analysis (HCA)**: Based on the principal components extracted by PCA for automatic classification, generate a hierarchical tree diagram, and determine different mudstone sequence types. - **Random forest (RF) model**: Establish a mudstone sequence identification model, and optimize the model parameters through grid search (GS) and K - fold cross - validation (KCV) to improve the prediction accuracy. 3. **Results**: - **Mudstone sequence classification**: Through PCA and HCA, the mudstone sequence of the third member is divided into five types: massive argillaceous limestone (MAL), layered calcareous mudstone (LCC), discontinuous layered argillaceous limestone (ILAL), continuous layered argillaceous limestone (CLAL) and layered mixed shale (LMS). - **Model performance**: The optimized RF model performs excellently in the mudstone sequence prediction of non - cored sections, with an accuracy rate of more than 93%. ### Significance of the paper: - **Improve efficiency**: The automated method significantly reduces the time and cost of manual classification. - **Improve accuracy**: Through machine learning methods, the accuracy of mudstone sequence classification and logging prediction is improved. - **Application prospects**: This method can be applied to the Bohai Bay Basin and similar reservoirs to improve exploration efficiency and reduce economic costs. ### Formula display: - **Principal component analysis (PCA)**: - Calculation of eigenvalue and variance contribution rate: \[ \text{PC1} = 0.194\times\text{clay}- 0.208\times\text{carb}+ 0.195\times\text{felsic}+ 0.113\times\text{chlorite}+ 0.089\times\text{porosity}+ 0.097\times\text{permeability}+ 0.007\times\text{So}- 0.196\times\text{density}- 0.001\times\text{structure}+ 0.183\times\text{TOC} \] \[ \text{PC2} = - 0.027\times\text{clay}+ 0.035\times\text{carb}+ 0.023\times\text{felsic}+ 0.335\times\text{chlorite}- 0.390\times\text{porosity}+ 0.066\times\text{permeability}+ 0.412\times\text{So}+ 0.069\times\text{de}