Protein-Protein-Interactions (PPIs) are involved in almost all the cellular processes and understanding the structural basis of PPIs remains an important endeavor. The identification of the interface residues may shed light in many important aspects like drug development, elucidation of molecular pathways, generation of protein mimetics and understanding of disease mechanisms as well as development of docking methodologies to build structural models of protein complexes. Over the past few years, advances in high-throughput PPI identification techniques, such as yeast two-hybrid analysis and affinity purification coupled with mass spectrometry, have enabled the researchers to identify sets of interacting proteins in yeast, Drosophila and other organisms. Unfortunately, these experimental methods do not provide residue level insight into the structure of the interactions between the proteins. The uses of X-Ray crystallography and nuclear magnetic resonance (NMR) spectroscopy to determine the structural basis of an interaction are time consuming and overall expensive. In response to these difficulties, a number of different bioinformatics algorithms with varying degrees of accuracies have been developed that use a wide variety of data sources to predict PPIs and modes of binding between proteins. Machine learning techniques such as Support Vector Machines (SVMs) and Random Forests (RFs) have been used recently to solve problems such as prediction of catalytic residues and prediction and analysis of structure-based PPI interfaces. Previous machine learning approaches to the PPI interface prediction problems used features pertaining to evolutionary amino acid sequence conservation, phylogeny, and GO (Gene Ontology) protein annotation and, in most of the cases, protein structures. Till date there are very few computational methods available that are based solely on protein sequences.
Keywords: Docking, NMR, Protein-protein interactions, RF, SVM, X-ray crystallography.