Data-driven interpretable machine learning for prediction of porosity and permeability of tight sandstone reservoir
Abstract
Porosity and permeability are crucial indicators in the identification of high-quality reservoirs and favorable “sweet spot” zones, as well as key parameters when predicting and evaluating the development potential of fossil fuels like oil and gas. However, it is impracticable to collect enough core samples on vertical and horizontal planes for analysis due to the associated time and cost demand. Machine learning algorithms have shown remarkable capabilities in predicting the petrophysical properties by capturing non-linear relationships among logging data. In this study, to quantify the selection of logging curves and reduce the redundant logging data input, a novel and interpretable Permutation Importance-Set algorithm is proposed on the basis of logging data from the Upper Triassic Xujiahe Formation in the Sichuan Basin. The results indicate that, because of compaction, burial depth is the primary feature affecting the physical properties of tight sandstone reservoirs. Acoustic and spontaneous potential logs are critical for porosity, while density and spontaneous potential logs are pivotal for permeability, reflecting the complex diagenesis caused by the widespread sand-mud interbedding. Basin-level prediction models for porosity and permeability were developed using ten machine learning algorithms, then ablation studies confirmed the effectiveness of our feature selection and the reduced model complexity and over-fitting. This study offers a concise, interpretable prediction model with superior accuracy and interpretability for tight sandstone reservoirs.
Document Type: Original article
Cited as: Cao, L., Jiang, F., Chen, Z., Gao, Y., Huo, L., Chen, D. Data-driven interpretable machine learning for prediction of porosity and permeability of tight sandstone reservoir. Advances in Geo-Energy Research, 2025, 16(1): 21-35. https://doi.org/10.46690/ager.2025.04.04
Keywords:
Data-driven modeling, interpretable machine learning, permutation importance-set, reservoir characterizationReferences
Alfi, M., Hosseini, S. A., Enriquez, D., et al. A new technique for permeability calculation of core samples from unconventional gas reservoirs. Fuel, 2019, 235: 301-305.
Al Khalifah, H., Glover, P. W. J., Lorinczi, P. Permeability prediction and diagenesis in tight carbonates using machine learning techniques. Marine and Petroleum Geol ogy, 2020, 112: 104096.
Ampomah, W., Balch, R. S., Cather, M., et al. Optimum design of CO2 storage and oil recovery under geological uncertainty. Applied Energy, 2017, 195: 80-92.
Aras, S., Hanifi Van, M. An interpretable forecasting frame work for energy consumption and CO2 emissions. Ap plied Energy, 2022, 328: 120163.
Belhouchet, H. E., Benzagouta, M. S., Dobbi, A., et al. A new empirical model for enhancing well log permeability pre diction, using nonlinear regression method: Case study from Hassi-Berkine oil field reservoir-Algeria. Journal of King Saud University-Engineering Sciences, 2021, 33: 136-145.
Breiman, L. Bagging predictors. Machine Learning, 1996, 24: 123-140.
Breiman, L. Random forests. Machine Learning, 2001, 45: 5-32.
Chai, X., Tian, L., Wang, J., et al. A novel prediction model of oil-water relative permeability based on fractal theory in porous media. Fuel, 2024, 372: 131840.
Chen, T., Guestrin, C. XGBoost: A scalable tree boosting system. Paper Presented at the Knowledge Discovery and Data Mining, ACM, San Francisco, CA, USA, 13-17 August, 2016.
Chork, C. Y., Jian, F. X., Taggart, I. J. Porosity and permeability estimation based on segmented well log data. Journal of Petroleum Science and Engineering, 1994, 11: 227-239.
Chou, P. A. Optimal partitioning for classification and regression trees. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1991, 13: 340-354.
Cover, T., Hart, P. Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 1967, 13: 21-27.
Deng, J. Control problems of grey systems. Systems & Control Letters, 1982, 1(5): 288-294.
Deng, J., Liu, M., Ji, Y., et al. Controlling factors of tight sandstone gas accumulation and enrichment in the slope zone of foreland basins: The Upper Triassic Xujiahe Formation in Western Sichuan Foreland Basin, China. Journal of Petroleum Science and Engineering, 2022, 214: 110474.
Drucker, H., Burges, C. J. C., Kaufman, L., et al. Support vector regression machines. Paper Presented at Neural Information Processing Systems, USA, 2-5 December, 1996.
Fisher, A., Rudin, C., Dominici, F. All models are wrong, but many are useful: Learning a variable’s importance by studying an entire class of prediction models simul taneously. Journal of Machine Learning Research, 2019, 20(177): 1-81.
Freund, Y., Schapire, R. E. A Decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997, 55: 119-139.
Gou, M., Lu, G., Deng, B., et al. Tectonic-paleogeographic evolution of the Late Triassic in the Sichuan basin, SW China: Constraints from sedimentary facies and provenance analysis of the Xujiahe Formation. Marine and Petroleum Geology, 2024, 160: 106649.
Hauke, J., Kossowski, T. Comparison of values of Pearson’s and Spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 2011, 30: 87-93.
Jiang, D., Chen, H., Xing, J., et al. A new method for dynamic predicting porosity and permeability of low permeability and tight reservoir under effective overburden pressure based on BP neural network. Geoenergy Science and Engineering, 2023, 226: 211721.
Jolliffe, I. T. Principal Component Analysis, Springer Series in Statistics. New York, USA, Springer New York, 1986.
Karpatne, A., Ebert-Uphoff, I., Ravela, S., et al. Machine learning for the geosciences: Challenges and opportunities. IEEE Transactions On Knowledge and Data Engineering, 2019, 31: 1544-1554.
Ke, G., Meng, Q., Finley, T., et al. LightGBM: A highly efficient gradient boosting decision tree. Paper Presented at Neural Information Processing Systems, Long Beach, CA, USA, 4-9 December, 2017.
Liu, J., Cao, J., Hu, G., et al. Water-level and redox fluctuations in a Sichuan Basin lacustrine system coincident with the Toarcian OAE. Palaeogeography, Palaeoclima tology, Palaeoecology, 2020, 558: 109942.
Liu, Y., Hu, W., Cao, J., et al. Diagenetic constraints on the heterogeneity of tight sandstone reservoirs: A case study on the Upper Triassic Xujiahe Formation in the Sichuan Basin, southwest China. Marine and Petroleum Geology, 2018, 92: 650-669.
Lu, H., Li, Q., Yue, D., et al. Study on optimal selection of porosity logging interpretation methods for Chang 73 segment of the Yanchang Formation in the southwestern Ordos Basin, China. Journal of Petroleum Science and Engineering, 2021, 198: 108153.
Otchere, D. A., Ganat, T. O. A., Ojero, J. O., et al. Application of gradient boosting regression model for the evaluation of feature selection techniques in improving reservoir characterisation predictions. Journal of Petroleum Sci ence and Engineering, 2022, 208: 109244.
Pelikan, M., Goldberg, D. E., Cantu-Paz, E. Bayesian opti mization algorithm, population sizing, and time to con vergence. Paper Presented at Proceedings of the Genetic and Evolutionary Computation Conference, Las Vegas, Nevada, USA, 8-12 July, 2000.
Prokhorenkova, L., Gusev, G., Vorobev, A., et al. Cat Boost: Unbiased boosting with categorical features. Paper Presented at Neural Information Processing Systems, Canada, 3-8 December, 2018. Quinlan, J. R. Induction of decision trees. Machine Learning, 1986, 1: 81-106.
Quinlan, J. R. C4.5: Programs for Machine Learning. San Mateo, California, USA, Morgan Kaufmann Publishers, 1993.
Saaty, T. L. Multicriteria Decision Making: The Analytic Hierarchy Process: Planning, Priority Setting, Resource Allocation. New York, USA, McGraw-Hill, 1988.
Seber, G. A. F., Wild, C. J. Nonlinear Regression. New York, USA, Wiley, 1989.
Shen, P., Li, G., Li, B., et al. Coupling effect of porosity and hydrate saturation on the permeability of methane hydrate-bearing sediments. Fuel, 2020, 269: 117425.
Shi, Z., Zhou, T., Guo, C. Clastic sedimentary records of the Upper Triassic Sichuan Basin, China: Implications for the transition from marine to transitional environment. Geological Journal. 2022, 57: 4393-4411.
Sun, L., Zou, C., Jia, A., et al. Development characteristics and orientation of tight oil and gas in China. Petroleum Exploration and Development, 2019, 46: 1073-1087.
Wang, W., Pang, X., Chen, Z., et al. Improved methods for determining effective sandstone reservoirs and evaluating hydrocarbon enrichment in petroliferous basins. Applied Energy, 2020, 261: 114457.
Wolpert, D. H. Stacked generalization. Neural Networks, 1992, 5: 241-259.
Wood, D. A. Variable interaction empirical relationships and machine learning provide complementary in sight to experimental horizontal wellbore cleaning results. Advances in Geo-Energy Research, 2023, 9(3): 172-184.
Yang, Y., Wen, L., Zhou, G., et al. New fields, new types and resource potentials of hydrocarbon exploration in Sichuan Basin. Acta Petrolei Sinica, 2023a, 44: 2045-2069. (in Chinese)
Yang, Z., Shabani, M., Solano, N., et al. Experimental deter mination of gas-water relative permeability for ultra-low permeability reservoirs using crushed-rock samples: Im plications for drill cuttings characterization. Fuel, 2023b, 347: 128331.
Yu, Q., Xiong, Z., Du, C., et al. Identification of rock pore structures and permeabilities using electron microscopy experiments and deep learning interpretations. Fuel, 2020, 268: 117416.
Zhang, G., Wang, Z., Mohaghegh, S., et al. Pattern visualization and understanding of machine learning models for permeability prediction in tight sandstone reservoirs. Journal of Petroleum Science and Engineering, 2021, 200: 108142.
Zhang, J., Yin, X., Zhang, G., et al. Prediction method of physical parameters based on linearized rock physics in version. Petroleum Exploration and Development, 2020, 47: 59-67.
Zhang, L., Gao, L., Jing, B., et al. Permeability estimation of shale oil reservoir with laboratory-derived data: A case study of the chang 7 member in Ordos Basin. Applied Geophysics, 2023, 21(3): 440-455.
Zhang, S. Research on key technologies for stroke medical data mining. Zhengzhou, Zhengzhou University, 2022. (in Chinese)
Zhao, C., Chen, B. China’s oil security from the supply chain perspective: A review. Applied Energy, 2014, 136: 269 279.
Zhao, X., Chen, X., Huang, Q., et al. Logging-data-driven permeability prediction in low-permeable sandstones based on machine learning with pattern visualization: A case study in Wenchang A Sag, Pearl River Mouth Basin. Journal of Petroleum Science and Engineering, 2022, 214: 110517.
Zou, C., Zhu, R., Liu, K., et al. Tight gas sandstone reservoirs in China: Characteristics and recognition criteria. Journal of Petroleum Science and Engineering, 2012, 88-89: 82-91.
Downloads
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Author(s)

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.