Evaluating heavy metal pollution and health risks in river systems using Random Forest and XGBoost: Evidence from the Shkumbin River

Loading...
Thumbnail Image
Date
2025-12-19
Journal Title
Journal ISSN
Volume Title
Publisher
Plovdiv University Press "Paisii Hilendarski"
Abstract
Surface water contamination by heavy metals poses significant ecological and health risks due to their persistence, bioaccumulation, and toxicity. This research evaluated the concentrations of cadmium (Cd), chromium (Cr), copper (Cu), iron (Fe), lead (Pb), and zinc (Zn) in river water samples and assessed their impact on the Heavy Metal Pollution Index (HPI). Descriptive statistics revealed substantial variation among sampling sites, with HPI values ranging from 2.15 to 21.94. Although Cd and Pb were generally present in low concentrations, their localized maxima indicated potential hot spots of contamination, whereas Fe and Zn showed higher overall levels. To identify the most influential predictors of HPI, two machine learning regression models, Random Forest (RF) and Extreme Gradient Boosting (XGBoost), were implemented. The RF model explained more than 90% of the variance in HPI, with Cd, Zn, and Cr emerging as the most critical contributors. The XGBoost model achieved even higher predictive accuracy (R² = 0.998, RMSE = 0.76), confirming Cd and Cr as dominant predictors, together accounting for nearly 80% of the model’s explanatory power. These findings highlight the pivotal role of Cd and Cr in shaping HPI dynamics and demonstrate the utility of ensemble learning methods for environmental monitoring and risk assessment.
Description
Keywords
Heavy Metal Pollution Index, Machine Learning Models, Random Forest, XGBoost Models
Citation