Proteins with at least one carbohydrate recognition domain are lectins that can identify and reversibly interact with glycan moiety of glycoconjugates or a soluble carbohydrate. It has been proved that lectins can play various vital roles in mediating signal transduction, cell-cell recognition and interaction, immune defense, and so on. Most organisms can synthesize and secret lectins. A portion of lectins closely related to diverse cancers, called cancerlectins, are involved in tumor initiation, growth and recrudescence. Cancerlectins have been investigated for their applications in the laboratory study, clinical diagnosis and therapy, and drug delivery and targeting of cancers. The identification of cancerlectin genes from a lot of lectins is helpful for dissecting cancers. Several cancerlectin prediction tools based on machine learning approaches have been established and have become an excellent complement to experimental methods. In this review, we comprehensively summarize and expound the indispensable materials for implementing cancerlectin prediction models. We hope that this review will contribute to understanding cancerlectins and provide valuable clues for the study of cancerlectins. Novel systems for cancerlectin gene identification are expected to be developed for clinical applications and gene therapy.
Keywords: Cancerlectin, Non-cancerlectin, Feature extraction and selection, Machine learning method, PSSM, Prosite.