In our continuing efforts to find out acceptable Absorption, Distribution, Metabolization, Elimination and Toxicity (ADMET) properties of organic compounds, we establish linear QSAR models for the carcinogenic potential prediction of 1464 compounds taken from the “Galvez data set”, that include many marketed drugs. More than a thousand of geometry-independent molecular descriptors are simultaneously analyzed, obtained with the softwares E-Dragon and Recon. The variable subset selection method employed is the Replacement Method, and also the improved version Enhanced Replacement Method. The established models are properly validated through an external test set of compounds, and by means of the Leave-Group-Out Cross Validation method. In addition, we apply the Y-Randomization strategy and analyze the Applicability Domain of the developed model. Finally, we compare the results obtained in present study with the previous ones from the literature. The novelty of present work relies on the development of an alternative predictive structure-carcinogenicity relationship in a large heterogeneous set of organic compounds, by only using a reduced number of geometry independent molecular descriptors.
Keywords: QSAR theory, ADMET, multivariable linear regression analysis, carcinogenicity, molecular descriptors, carcinogenic potential, heterogeneous, Molecular Descriptors Selection, Molecular Descriptors Calculation, Atom-Centred Fragments