Accurately identifying functional sites in proteins is one of the most important topics in bioinformatics and systems biology. In bioinformatics, identifying protease cleavage sites in protein sequences can aid drug/inhibitor design. In systems biology, post-translational protein-protein interaction activity is one of the major components for analyzing signaling pathway activities. Determining functional sites using laboratory experiments are normally time consuming and expensive. Computer programs have therefore been widely used for this kind of task. Mining protein sequence data using computer programs covers two major issues: 1) discovering how amino acid specificity affects functional sites and 2) discovering what amino acid specificity is. Both need a proper coding mechanism prior to using a proper machine learning algorithm. The development of the bio-basis function neural network (BBFNN) has made a new way for protein sequence data mining. The bio-basis function used in BBFNN is biologically sound in well coding biological information in protein sequences, i.e. well measuring the similarity between protein sequences. BBFNN has therefore been outperforming conventional neural networks in many subjects of protein sequence data mining from protease cleavage site prediction to disordered protein identification. This review focuses on the variants of BBFNN and their applications in mining protein sequence data.
Keywords: bio-basis function, neural networks, protein data mining, bioinformatics, systems biology