Conotoxins are disulfide-rich small peptides that are invaluable channel-targeted peptides and target neuronal receptors, which have been demonstrated to be potent pharmaceuticals in the treatment of Alzheimers disease, Parkinsons disease, and epilepsy. Accurate prediction of conotoxin superfamily would have many important applications towards the understanding of its biological and pharmacological functions. In this study, a novel method, named dHKNN, is developed to predict conotoxin superfamily. Firstly, we extract the proteins sequential features composed of physicochemical properties, evolutionary information, predicted secondary structures and amino acid composition. Secondly, we use the diffusion maps for dimensionality reduction, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set in order to obtain efficient representation of data geometric descriptions. Finally, an improved K-local hyperplane distance nearest neighbor subspace classifier method called dHKNN is proposed for predicting conotoxin superfamilies by considering the local density information in the diffusion space. The overall accuracy of 91.90% is obtained through the jackknife cross-validation test on a benchmark dataset, indicating the proposed dHKNN is promising.
Keywords: Conotoxin superfamily, diffusion map, subspace classifier, dHKNN, non-addictive, disulfide-rich, wet-lab experiments, TNoM score, PseAAC, AAC