Paskaleva, VesselinaCokova, Gergana2025-12-162025-12-162025-12-041313-9940https://doi.uni-plovdiv.bg/handle/store/835This research presents a thorough exploratory data analysis to develop an in silico model for mutagenicity prediction, contributing to Safe-by-Design strategies. Using a publicly available dataset, chemical structures were encoded via a range of molecular fingerprints and descriptors. Multiple machine learning algorithms—including k-nearest neighbors, support vector machines, and random forest—were assessed. Performance was validated through 10- fold cross-validation and further tested on an external dataset. Random Forest emerged as the most effective method, achieving a cross-validation MCC of 0.68. The in-house models showed competitive performance relative to existing publicly available tools.enames mutagenicitymachine learningQSARin silicoIn silico methods for mutagenicity predictionArticle