Applicability of machine learning models for predicting soil organic carbon content and bulk density under different soil conditions
Department of Landscape Protection and Environmental Geography, University of Debrecen, Hungary
Centre for Agricultural Research, Department of Soil Mapping and Environmental Informatics, Institute for Soil Sciences, Hungary
Department of Physical Geography and Geoinformatics, University of Debrecen, Hungary
Fatemeh Hateffard   

Department of Landscape Protection and Environmental Geography, University of Debrecen, Egyetem tér 1, 4032, Debrecen, Hungary
Data nadesłania: 18-10-2022
Data ostatniej rewizji: 21-02-2023
Data akceptacji: 04-05-2023
Data publikacji online: 04-05-2023
Data publikacji: 20-06-2023
Soil Sci. Ann., 2023, 74(1)165879
A reliable overview of the spatial distribution of soil properties is a straightforward approach in soil policies and decision-making. Soil organic carbon (SOC) content, SOC stock and bulk density (BD) directly affect soil quality and fertility. Therefore, an accurate assessment of these crucial soil parameters is required. To do this, we used machine learning algorithms (MLAs) including, multiple linear regression (MLR), random forest (RF), artificial neural network (ANN), and support vector machine (SVM) with the help of environmental covariates to predict SOC content, BD, and SOC stock. The study was conducted in two different areas, Látókép and Westsik (East Hungary), both experimental research fields but different from physio geographic points of view. Thirty topsoils (0-10 cm) samples were collected for each study area using conditioned Latin Hypercube Sampling strategy. Environmental covariates were extracted from a digital elevation model (DEM) and satellite images based on the representation of soil forming factors. We validated the results by randomly splitting the dataset into a train (two-third) and test (one-third) and calculated the root mean square error and R2. Our results showed that RF provided the most accurate spatial prediction with R2 of about 80% for each soil property in both study areas. This study highlighted the importance of terrain attributes (including plan and profile curvature, elevation and valley depth) and NDVI derived from satellite images in presenting a spatial distribution of selected soil properties in two different areas. We conclude that comparing these methods can help to determine the most accurate maps under diverse geographical conditions and heterogeneities at different scales, which can be used in precision soil quality management.
Ao, Y., Li, H., Zhu, L., Ali, S., Yang, Z., 2019. The linear random forest algorithm and its advantages in machine learning assisted logging regression modelling. Journal of Petroleum Science and Engineering 174, 776-789.
Arrouays, D., McBratney, A., Bouma, J., Libohova, Z., Richer-de-Forges, A. C., Morgan, C. L., Mulder, V. L., 2020. Impressions of digital soil maps: The good, the not so good, and making them ever better. Geoderma Regional 20, e00255.
Asgari, N., Ayoubi, S., Jafari, A., Demattê, J. A., 2020. Incorporating environmental variables, remote and proximal sensing data for digital soil mapping of USDA soil great groups. International Journal of Remote Sensing 41(19), 7624-7648.
Balducci, F., Impedovo, D., Pirlo, G., 2018. Machine learning applications on agricultural datasets for smart farm enhancement. Machines 6(3), 38.
Bashfield, A., Keim, A., 2011. Continent-wide DEM creation for the European Union. In 34th International Symposium on Remote Sensing of Environment. The GEOSS Era: Towards Operational Environmental Monitoring. Sydney, Australia (pp. 10-15).
Behrens, T., Förster, H., Scholten, T., Steinrücken, U., Spies, E. D., Goldschmitt, M., 2005. Digital soil mapping using artificial neural networks. Journal of plant nutrition and soil science 168(1), 21-33.
Bhunia, G. S., Kumar Shit, P., Pourghasemi, H. R., 2019. Soil organic carbon mapping using remote sensing techniques and multivariate regression model. Geocarto International 34(2), 215-226.
Breiman, L., 2001. Random forests. Machine learning 45(1), 5-32.
Brungard, C. W., Boettinger, J. L., 2010. Conditioned latin hypercube sampling: Optimal sample size for digital soil mapping of arid rangelands in Utah, USA. In Digital soil mapping 67-75. Springer, Dordrecht.
Chen, L., Ren, C., Li, L., Wang, Y., Zhang, B., Wang, Z., Li, L., 2019. A comparative assessment of geostatistical, machine learning, and hybrid approaches for mapping topsoil organic carbon content. ISPRS International Journal of Geo-Information 8(4), 174.
Conrad, O., Bechtel, B., Bock, M., Dietrich, H., Fischer, E., Gerlitz, L., Böhner, J., 2015. System for automated geoscientific analyses (SAGA) v. 2.1. 4. Geoscientific Model Development 8(7), 1991-2007.
Dai, F., Zhou, Q., Lv, Z., Wang, X., Liu, G., 2014. Spatial prediction of soil organic matter content integrating artificial neural network and ordinary kriging in Tibetan Plateau. Ecological Indicators 45, 184-194.
Forkuor, G., Hounkpatin, O. K., Welp, G., Thiel, M., 2017. High resolution mapping of soil properties using remote sensing variables in south-western Burkina Faso: a comparison of machine learning and multiple linear regression models. PloS one 12(1), e0170478.
Ghaderi, A., Abbaszadeh Shahri, A., Larsson, S., 2019. An artificial neural network based model to predict spatial soil type distribution using piezocone penetration test data (CPTu). Bulletin of Engineering Geology and the Environment 78(6), 4579-4588.
Ghafouri Kesbi, F., Rahimi Mianji, G., Honarvar, M., Nejati Javaremi, A., 2016. Tuning and application of random forest algorithm in genomic evaluation. Research On Animal Production (Scientific and Research) 7(13), 185-178.
Gomes, L. C., Faria, R. M., de Souza, E., Veloso, G. V., Schaefer, C. E. G., Fernandes Filho, E. I., 2019. Modelling and mapping soil organic carbon stocks in Brazil. Geoderma 340, 337-350.
Hateffard, F., Balog, K., Tóth, T., Mészáros, J., Árvai, M., Kovács, Z. A., Szatmári, G., 2022. High-Resolution Mapping and Assessment of Salt-Affectedness on Arable Lands by the Combination of Ensemble Learning and Multivariate Geostatistics. Agronomy 12(8), 1858.
Hateffard, F., Dolati, P., Heidari, A., Zolfaghari, A. A., 2019. Assessing the performance of decisiontree and neural network models in mapping soil properties. Journal of Mountain Science 16(8).
Hateffard, F., Novák, T. J., 2021. Soil sampling design optimization by using conditioned Latin Hypercube sampling (No. ISMC2021-35). Copernicus Meetings.
Hengl, T., Heuvelink, G. B., Kempen, B., Leenaars, J. G., Walsh, M. G., Shepherd, K. D., Tondoh, J. E., 2015. Mapping soil properties of Africa at 250 m resolution: Random forests significantly improve current predictions. PloS One 10(6), e0125814.
Hengl, T., Nussbaum, M., Wright, M. N., Heuvelink, G. B., Gräler, B., 2018. Random forest as a generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ 6, e5518.
Heung, B., Ho, H. C., Zhang, J., Knudby, A., Bulmer, C. E., Schmidt, M. G., 2016. An overview and comparison of machine-learning techniques for classification purposes in digital soil mapping. Geoderma 265, 62-77.
Hounkpatin, K. O., Schmidt, K., Stumpf, F., Forkuor, G., Behrens, T., Scholten, T., Welp, G., 2018. Predicting reference soil groups using legacy data: A data pruning and Random Forest approach for tropical environment (Dano catchment, Burkina Faso). Scientific reports 8(1).
James, G., Witten, D., Hastie, T., Tibshirani, R., 2013. An introduction to statistical learning (Vol. 112, p. 18). New York: springer.
John, K., Abraham Isong, I., Michael Kebonye, N., Okon Ayito, E., Chapman Agyeman, P., Marcus Afu, S., 2020. Using machine learning algorithms to estimate soil organic carbon variability with environmental variables and soil nutrient indicators in an alluvial soil. Land 9(12), 487.
Kottek, M., Grieser, J., Beck, C., Rudolf, B., Rubel, F., 2006. World map of the Köppen-Geiger climate classification updated.
Kovačević, M., Bajat, B., Gajić, B., 2010. Soil type classification and estimation of soil properties using support vector machines. Geoderma 154(3-4), 340-347.
Laborczi, A., Szatmári, G., Kaposi, A. D., Pásztor, L., 2019. Comparison of soil texture maps synthetized from standard depth layers with directly compiled products. Geoderma 352, 360-372.
Lawrence, I., Lin, K., 1989. A concordance correlation coefficient to evaluate reproducibility. Biometrics 255-268.
Li, Q. Q., Yue, T. X., Wang, C. Q., Zhang, W. J., Yu, Y., Li, B., Bai, G. C., 2013. Spatially distributed modeling of soil organic matter across China: An application of artificial neural network approach. Catena 104, 210-218.
Liu, E., Liu, J., Yu, K., Wang, Y., He, P., 2020. A hybrid model for predicting spatial distribution of soil organic matter in a bamboo forest based on general regression neural network and interative algorithm. Journal of Forestry Research 31(5), 1673-1680.
Ma, Y., Minasny, B., Malone, B. P., Mcbratney, A. B., 2019. Pedology and digital soil mapping (DSM). European Journal of Soil Science 70(2), 216-235.
McBratney, A. B., Santos, M. M., Minasny, B., 2003. On digital soil mapping. Geoderma 117(1-2), 3-52.
Moody, J., 1994. Prediction risk and architecture selection for neural networks. In From statistics to neural networks, 147-165. Springer, Berlin, Heidelberg.
Mora-Vallejo, A., Claessens, L., Stoorvogel, J., Heuvelink, G. B., 2008. Small scale digital soil mapping in Southeastern Kenya. Catena 76(1), 44-53.
Mosleh, Z., Salehi, M. H., Jafari, A., Borujeni, I. E., Mehnatkesh, A., 2016. The effectiveness of digital soil mapping to predict soil properties over low-relief areas. Environmental monitoring and assessment 188(3), 1-13.
Owens, P. R., Dorantes, M. J., Fuentes, B. A., Libohova, Z., Schmidt, A., 2020. Taking digital soil mapping to the field: Lessons learned from the Water Smart Agriculture soil mapping project in Central America. Geoderma Regional 22, e00285.
Piccini, C., Marchetti, A., Francaviglia, R., 2014. Estimation of soil organic matter by geostatistical methods: Use of auxiliary information in agricultural and environmental assessment. Ecological Indicators 36, 301-314.
Piekutowska, M., Niedbała, G., Piskier, T., Lenartowicz, T., Pilarski, K., Wojciechowski, T., Czechowska-Kosacka, A., 2021. The application of multiple linear regression and artificial neural network models for yield prediction of very early potato cultivars before harvest. Agronomy 11(5), 885.
Poggio, L., de Sousa, L. M., Batjes, N. H., Heuvelink, G. B. M., Kempen, B., Ribeiro, E., and Rossiter, D., 2021. SoilGrids 2.0: producing soil information for the globe with quantified spatial uncertainty, SOIL 7, 217–240., 2021.
Song, Y. Q., Yang, L. A., Li, B., Hu, Y. M., Wang, A. L., Zhou, W., Liu, Y. L., 2017. Spatial prediction of soil organic matter using a hybrid geostatistical model of an extreme learning machine and ordinary kriging. Sustainability 9(5), 754.
Stoorvogel, J. J., Kooistra, L., Bouma, J., 2015. Managing Soil Variability. Soil-specific farming: precision agriculture, 22, 37.
Tang, W., Li, Y., Yu, Y., Wang, Z., Xu, T., Chen, J., Li, X., 2020. Development of models predicting biodegradation rate rating with multiple linear regression and support vector machine algorithms. Chemosphere 253, 126666.
Wadoux, A. M. C., Minasny, B., McBratney, A. B., 2020. Machine learning for digital soil mapping: Applications, challenges and suggested solutions. Earth-Science Reviews 210, 103359.
Were, K., Bui, D. T., Dick, Ø. B., Singh, B. R., 2015. A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape. Ecological Indicators 52, 394-403.
Zeraatpisheh, M., Ayoubi, S., Jafari, A., Tajik, S., Finke, P., 2019. Digital mapping of soil properties using multiple machine learning in a semi-arid region, central Iran. Geoderma 338, 445-452.
Zhao, Y. C., Shi, X. Z., 2010. Spatial prediction and uncertainty assessment of soil organic carbon in Hebei Province, China. Digital soil mapping: bridging research, environmental application, and operation 227-239.
Zhao, Z., Chow, T. L., Rees, H. W., Yang, Q., Xing, Z., Meng, F. R., 2009. Predict soil texture distributions using an artificial neural network model. Computers and electronics in agriculture 65(1), 36-48.