This work reports a detailed study of the ability of linear and non-linear classification methods to estimate the estrogenic activities of a series of 55 natural estrogen-like isoflavonoid and diphenolic compounds. In doing so, we examined the use of linear discriminant analysis (LDA) and nonlinear support vector machines (SVMs) techniques along with feature selection algorithms. The structural characteristics of each of the studied compounds were calculated from the optimized molecular geometries. Both the LDA and SVMs models contain four descriptors, however, the SVMs model (total accuracy 89.1%) was found to be superior to the LDA model (total accuracy 80.0%). The analysis of molecular descriptors within our models provided essential insights towards a better understanding of the estrogenic mechanisms of natural estrogen-like phytoestrogens. Furthermore, the derived models can be applied in the future screening of other natural estrogen-like compounds.
Keywords: Classification, QSAR, linear discriminant analysis, isoflavonoids and diphenolics, support vector machines.