With the explosion of protein sequences generated in the postgenomic era, the gap between the number of attribute- known proteins and that of uncharacterized ones has become increasingly large. Knowing the key attributes of proteins is a shortcut for prioritizing drug targets and developing novel new drugs. Unfortunately, it is both time-consuming and costly to acquire these kinds of information by purely conducting biological experiments. Therefore, it is highly desired to develop various computational tools for fast and effectively classifying proteins according to their sequence information alone. The process of developing these high throughput tools is generally involved with the following procedures: (1) constructing benchmark datasets; (2) representing a protein sequence with a discrete numerical model; (3) developing or introducing a powerful algorithm or machine learning operator to conduct the prediction; (4) estimating the anticipated accuracy with a proper and objective test method; and (5) establishing a user-friendly web-server accessible to the public. This minireview is focused on the recent progresses in identifying the types of G-protein coupled receptors (GPCRs), subcellular localization of proteins, DNA-binding proteins and their binding sites. All these identification tools may provide very useful informations for in-depth study of drug metabolism.
Keywords: GPCR type, protein subcellular localization, DNA-binding protein, discrete model, PseAAC, protein attribute predictor, web-servers.