Computational approaches are developed to design or rationally select, from structural databases, new lead trichomonacidal compounds. First, a data set of 111 compounds was split (design) into training and predicting series using hierarchical and partitional cluster analyses. Later, two discriminant functions were derived with the use of non-stochastic and stochastic atom-type linear indices. The obtained LDA (linear discrimination analysis)-based QSAR (quantitative structure-activity relationship) models, using non-stochastic and stochastic descriptors were able to classify correctly 95.56% (90.48%) and 91.11% (85.71%) of the compounds in training (test) sets, respectively. The result of predictions on the 10% full-out cross-validation test also evidenced the quality (robustness, stability and predictive power) of the obtained models. These models were orthogonalized using the Randic´ orthogonalization procedure. Afterwards, a simulation experiment of virtual screening was conducted to test the possibilities of the classification models developed here in detecting antitrichomonal chemicals of diverse chemical structures. In this sense, the 100.00% and 77.77% of the screened compounds were detected by the LDA-based QSAR models (Eq. 13 and Eq. 14, correspondingly) as trichomonacidal. Finally, new lead trichomonacidals were discovered by prediction of their antirichomonal activity with obtained models. The most of tested chemicals exhibit the predicted antitrichomonal effect in the performed ligand-based virtual screening, yielding an accuracy of the 90.48% (19/21). These results support a role for TOMOCOMD-CARDD descriptors in the biosilico discovery of new compounds.
Keywords: TOMOCOMD-CARDD Software, Atom-Based Linear Index, LDA-Based QSAR Model, Trichomonacidal Activity, Virtual Screening, Lead Antitrichomonal Compound, Cytocidal activity, Heterocycles