Protein Inter-Residue Contacts Prediction: Methods, Performances and Applications

Page: [178 - 189] Pages: 12

  • * (Excluding Mailing and Handling)

Abstract

Background: Protein inter-residue contacts prediction play an important role in the field of protein structure and function research. As a low-dimensional representation of protein tertiary structure, protein inter-residue contacts could greatly help de novo protein structure prediction methods to reduce the conformational search space. Over the past two decades, various methods have been developed for protein inter-residue contacts prediction.

Objective: We provide a comprehensive and systematic review of protein inter-residue contacts prediction methods.

Results: Protein inter-residue contacts prediction methods are roughly classified into five categories: correlated mutations methods, machine-learning methods, fusion methods, templatebased methods and 3D model-based methods. In this paper, firstly we describe the common definition of protein inter-residue contacts and show the typical application of protein inter-residue contacts. Then, we present a comprehensive review of the three main categories for protein interresidue contacts prediction: correlated mutations methods, machine-learning methods and fusion methods. Besides, we analyze the constraints for each category. Furthermore, we compare several representative methods on the CASP11 dataset and discuss performances of these methods in detail.

Conclusion: Correlated mutations methods achieve better performances for long-range contacts, while the machine-learning method performs well for short-range contacts. Fusion methods could take advantage of the machine-learning and correlated mutations methods. Employing more effective fusion strategy could be helpful to further improve the performances of fusion methods.

Keywords: Protein inter-residue contacts prediction, protein structure prediction, correlated mutations, residue coevolution, machine-learning, fusion method.

Graphical Abstract

[1]
Dill KA, MacCallum JL. The protein-folding problem, 50 years on. Science 2012; 338(6110): 1042-6.
[2]
Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. The I-TASSER Suite: protein structure and function prediction. Nat Methods 2015; 12(1): 7-8.
[3]
Källberg M, Wang H, Wang S, et al. Template-based protein structure modeling using the RaptorX web server. Nat Protoc 2012; 7(8): 1511-22.
[4]
Li D, Ju Y, Zou Q. Protein folds prediction with hierarchical structured SVM. Curr Proteomics 2016; 13(2): 79-85.
[5]
Wei L, Liao M, Gao X, Zou Q. Enhanced protein fold prediction method through a novel feature extraction technique. IEEE Trans Nanobioscience 2015; 14(6): 649-59.
[6]
Zhang Y. I-TASSER: fully automated protein structure prediction in CASP8. Proteins 2009; 77(Suppl. 9): 100-13.
[7]
Marks DS, Hopf TA, Sander C. Protein structure prediction from sequence variation. Nat Biotechnol 2012; 30(11): 1072-80.
[8]
Vassura M, Margara L, Di Lena P, Medri F, Fariselli P, Casadio R. Reconstruction of 3D structures from protein contact maps. IEEE/ACM Trans Comput Biol Bioinform 2008; 5(3): 357-67.
[9]
Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015; 31(21): 3506-13.
[10]
Wang Z, Eickholt J, Cheng J. APOLLO: a quality assessment service for single and multiple protein models. Bioinformatics 2011; 27(12): 1715-6.
[11]
Miller CS, Eisenberg D. Using inferred residue contacts to distinguish between correct and incorrect protein models. Bioinformatics 2008; 24(14): 1575-82.
[12]
Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Proteins 2010; 78(8): 1980-91.
[13]
Wang S, Ma J, Peng J, Xu J. Protein structure alignment beyond spatial proximity. Sci Rep 2013; 3(3): 1448.
[14]
Xu J, Jiao F, Berger B. A parameterized algorithm for protein structure alignment. J Comput Biol 2007; 14(5): 564-77.
[15]
Horner DS, Pirovano W, Pesole G. Correlated substitution analysis and the prediction of amino acid structural contacts. Brief Bioinform 2008; 9(1): 46-56.
[16]
Göbel U, Sander C, Schneider R, Valencia A. Correlated mutations and residue contacts in proteins. Proteins 1994; 18(4): 309-17.
[17]
Neher E. How frequent are correlated changes in families of protein sequences? Proc Natl Acad Sci USA 1994; 91(1): 98-102.
[18]
Taylor WR, Hatrick K. Compensating changes in protein multiple sequence alignments. Protein Eng 1994; 7(3): 341-8.
[19]
Olmea O, Valencia A. Improving contact predictions by the combination of correlated mutations and other sources of sequence information. Fold Des 1997; 2(3): S25-32.
[20]
Pazos F, Helmer-Citterich M, Ausiello G, Valencia A. Correlated mutations contain information about protein-protein interaction. J Mol Biol 1997; 271(4): 511-23.
[21]
Korber BT, Farber RM, Wolpert DH, Lapedes AS. Covariation of mutations in the V3 loop of human immunodeficiency virus type 1 envelope protein: an information theoretic analysis. Proc Natl Acad Sci USA 1993; 90(15): 7176-80.
[22]
Clarke ND. Covariation of residues in the homeodomain sequence family. Protein Sci 1995; 4(11): 2269-78.
[23]
Larson SM, Di Nardo AA, Davidson AR. Analysis of covariation in an SH3 domain sequence alignment: applications in tertiary contact prediction and the design of compensating hydrophobic core substitutions. J Mol Biol 2000; 303(3): 433-46.
[24]
Kass I, Horovitz A. Mapping pathways of allosteric communication in GroEL by analysis of correlated mutations. Proteins 2002; 48(4): 611-7.
[25]
Noivirt O, Eisenstein M, Horovitz A. Detection and reduction of evolutionary noise in correlated mutation analysis. Protein Eng Des Sel 2005; 18(5): 247-53.
[26]
Lapedes AS, Giraud BG, Liu L, Stormo GD. Correlated mutations in models of protein sequences: phylogenetic and structural effects 1999. 236-56.
[27]
Weigt M, White RA, Szurmant H, Hoch JA, Hwa T. Identification of direct residue contacts in protein-protein interaction by message passing. Proc Natl Acad Sci USA 2009; 106(1): 67-72.
[28]
Jones DT, Buchan DW, Cozzetto D, Pontil M. PSICOV: precise structural contact prediction using sparse inverse covariance estimation on large multiple sequence alignments. Bioinformatics 2012; 28(2): 184-90.
[29]
Ekeberg M, Lövkvist C, Lan Y, Weigt M, Aurell E. Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models. Phys Rev E Stat Nonlin Soft Matter Phys 2013; 87(1): 012707.
[30]
Ekeberg M, Hartonen T, Aurell E. Fast pseudolikelihood maximization for direct-coupling analysis of protein structure from many homologous amino-acid sequences. J Comput Phys 2014; 276: 341-56.
[31]
Feinauer C, Skwark MJ, Pagnani A, Aurell E. Improving contact prediction along three dimensions. PLOS Comput Biol 2014; 10(10): e1003847.
[32]
Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci USA 2013; 110(39): 15674-9.
[33]
Balakrishnan S, Kamisetty H, Carbonell JG, Lee SI, Langmead CJ. Learning generative models for protein fold families. Proteins-structure Function & Bioinformatics 2011; 79(4): 1061-78.
[34]
Shackelford G, Karplus K. Contact prediction using mutual information and neural nets. Proteins 2007; 69(Suppl. 8): 159-64.
[35]
Punta M, Rost B. PROFcon: novel prediction of long-range contacts. Bioinformatics 2005; 21(13): 2960-8.
[36]
Xue B, Faraggi E, Zhou Y. Predicting residue-residue contact maps by a two-layer, integrated neural-network method. Proteins 2009; 76(1): 176-83.
[37]
Fariselli P, Casadio R. A neural network based predictor of residue contacts in proteins. Protein Eng 1999; 12(1): 15-21.
[38]
Tegge AN, Wang Z, Eickholt J, Cheng J. NNcon: improved protein contact map prediction using 2D-recursive neural networks. Nucleic Acids Res 2009; 37(Suppl. 2): W515-8.
[39]
Di Lena P, Nagata K, Baldi P. Deep architectures for protein contact map prediction. Bioinformatics 2012; 28(19): 2449-57.
[40]
Wu S, Zhang Y. A comprehensive assessment of sequence-based and template-based methods for protein contact prediction. Bioinformatics 2008; 24(7): 924-31.
[41]
Yuan Z. Better prediction of protein contact number using a support vector regression analysis of amino acid sequence. BMC Bioinformatics 2005; 6(1): 248.
[42]
Cheng J, Baldi P. Improved residue contact prediction using support vector machines and a large feature set. BMC Bioinformatics 2007; 8: 113.
[43]
Nugent T, Jones DT. Predicting transmembrane helix packing arrangements using residue contacts and a force-directed algorithm. PLOS Comput Biol 2010; 6(3): e1000714.
[44]
Björkholm P, Daniluk P, Kryshtafovych A, Fidelis K, Andersson R, Hvidsten TR. Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts. Bioinformatics 2009; 25(10): 1264-70.
[45]
Li Y, Fang Y, Fang J. Predicting residue-residue contacts using random forest models. Bioinformatics 2011; 27(24): 3379-84.
[46]
Wang XF, Chen Z, Wang C, Yan RX, Zhang Z, Song J. Predicting residue-residue contacts and helix-helix interactions in transmembrane proteins using an integrative feature-based random forest approach. PLoS One 2011; 6(10): e26767.
[47]
Wang Z, Xu J. Predicting protein contact map using evolutionary and physical constraints by integer programming. Bioinformatics 2013; 29(13): i266-73.
[48]
Jones DT, Singh T, Kosciolek T, Tetchner S. MetaPSICOV: combining coevolution methods for accurate prediction of contacts and long range hydrogen bonding in proteins. Bioinformatics 2015; 31(7): 999-1006.
[49]
Kosciolek T, Jones DT. Accurate contact predictions using covariation techniques and machine learning. Proteins 2016; 84(Suppl. 1): 145-51.
[50]
Fariselli P, Olmea O, Valencia A, Casadio R. Prediction of contact maps with neural networks and correlated mutations. Protein Eng 2001; 14(11): 835-43.
[51]
Hamilton N, Burrage K, Ragan MA, Huber T. Protein contact prediction using patterns of correlation. Proteins 2004; 56(4): 679-84.
[52]
Yang J, Jin Q-Y, Zhang B, Shen H-B. R2C: improving ab initio residue contact map prediction using dynamic fusion strategy and Gaussian noise filter. Bioinformatics 2016; 32(16): 2435-43.
[53]
Ma J, Wang S, Wang Z, Xu J. Protein contact prediction by integrating joint evolutionary coupling analysis and supervised learning. Bioinformatics 2015; 31(21): 3506-13.
[54]
Skolnick J, Kihara D, Zhang Y. Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm. Proteins 2004; 56(3): 502-18.
[55]
Misura KMS, Chivian D, Rohl CA, Kim DE, Baker D. Physically realistic homology models built with ROSETTA can be more accurate than their templates. Proc Natl Acad Sci USA 2006; 103(14): 5361-6.
[56]
Monastyrskyy B, D’Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. New encouraging developments in contact prediction: Assessment of the CASP11 results. Proteins 2016; 84(Suppl. 1): 131-44.
[57]
Xie J, Ding W, Chen L, Guo Q, Zhang WU. Advances in protein contact map prediction based on machine learning. Med Chem 2015; 11(3): 265-70.
[58]
Kamisetty H, Ovchinnikov S, Baker D. Assessing the utility of coevolution-based residue-residue contact predictions in a sequence- and structure-rich era. Proc Natl Acad Sci USA 2013; 110(39): 15674-9.
[59]
Wuyun Q, Zheng W, Peng Z, Yang J. A large-scale comparative assessment of methods for residue–residue contact prediction. Brief Bioinform 2018; 19(2): 219-30.
[60]
Kinch LN, Li W, Schaeffer RD, et al. CASP 11 target classification. Proteins 2016; 84(Suppl. 1): 20-33.
[61]
Vendruscolo M, Kussell E, Domany E. Recovery of protein structure from contact maps. Fold Des 1997; 2(5): 295-306.
[62]
Mirny L, Domany E. Protein fold recognition and dynamics in the space of contact maps. Proteins 1996; 26(4): 391-410.
[63]
Pollock DD, Taylor WR. Effectiveness of correlation analysis in identifying protein residues undergoing correlated evolution. Protein Eng 1997; 10(6): 647-57.
[64]
Izarzugaza JM, Graña O, Tress ML, Valencia A, Clarke ND. Assessment of intramolecular contact predictions for CASP7. Proteins 2007; 69(Suppl. 8): 152-8.
[65]
Dimmic MW, Hubisz MJ, Bustamente CD. Detecting coevolving amino acid sites using Bayesian mutational mapping. Bioinformatics 2005; 21(Suppl. 1): i126-35.
[66]
Ezkurdia I, Graña O, Izarzugaza JM, Tress ML. Assessment of domain boundary predictions and the prediction of intramolecular contacts in CASP8. Proteins 2009; 77(Suppl. 9): 196-209.
[67]
Monastyrskyy B, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact predictions in CASP9. Proteins 2011; 79(Suppl. 10): 119-25.
[68]
Monastyrskyy B, D’Andrea D, Fidelis K, Tramontano A, Kryshtafovych A. Evaluation of residue-residue contact prediction in CASP10. Proteins 2014; 82(Suppl. 2): 138-53.
[69]
Piana S, Klepeis JL, Shaw DE. Assessing the accuracy of physical models used in protein-folding simulations: quantitative evidence from long molecular dynamics simulations. Curr Opin Struct Biol 2014; 24(1): 98-105.
[70]
Tai CH, Bai H, Taylor TJ, Lee B. Assessment of template-free modeling in CASP10 and ROLL. Proteins 2014; 82(Suppl. 2): 57-83.
[71]
Michel M, Hayat S, Skwark MJ, Sander C, Marks DS, Elofsson A. PconsFold: improved contact predictions improve protein models. Bioinformatics 2014; 30(17): i482-8.
[72]
Skolnick J, Kolinski A, Ortiz AR. MONSSTER: a method for folding globular proteins with a small number of distance restraints. J Mol Biol 1997; 265(2): 217-41.
[73]
Kim DE, Dimaio F, Yu-Ruei Wang R, Song Y, Baker D. One contact for every twelve residues allows robust and accurate topology-level protein structure modeling. Proteins 2014; 82(Suppl. 2): 208-18.
[74]
Graña O, Baker D, MacCallum RM, et al. CASP6 assessment of contact prediction. Proteins 2005; 61(Suppl. 7): 214-24.
[75]
Tress ML, Valencia A. Predicted residue-residue contacts can help the scoring of 3D models. Proteins 2010; 78(8): 1980-91.
[76]
Goldman D, Istrail S, Papadimitriou CH, Eds. Algorithmic aspects of protein structure similarity. 40th Annual Symposium on Foundations of Computer Science. 1999 Oct 17-19; New York City, USA: IEEE
[77]
Andonov R, Malod-Dognin N, Yanev N. Maximum contact map overlap revisited. J Comput Biol 2011; 18(1): 27-41.
[78]
Caprara A, Lancia G, Eds. Structural alignment of large—size proteins via lagrangian relaxation. Proceedings of the sixth annual international conference on Computational biology. 2002 April 18-21; Washington, DC, USA: ACM
[79]
Morcos F, Pagnani A, Lunt B, et al. Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc Natl Acad Sci USA 2011; 108(49): E1293-301.
[80]
Baldassi C, Zamparo M, Feinauer C, et al. Fast and accurate multivariate Gaussian modeling of protein families: predicting residue contacts and protein-interaction partners. PLoS One 2014; 9(3): e92721.
[81]
Afonnikov DA, Oshchepkov DY, Kolchanov NA. Detection of conserved physico-chemical characteristics of proteins by analyzing clusters of positions with co-ordinated substitutions. Bioinformatics 2001; 17(11): 1035-46.
[82]
Afonnikov DA, Kolchanov NA. CRASP: a program for analysis of coordinated substitutions in multiple alignments of protein sequences. Nucleic Acids Res 2004; 32(Suppl. 2): W64-8.
[83]
Vicatos S, Reddy BV, Kaznessis Y. Prediction of distant residue contacts with the use of evolutionary information. Proteins 2005; 58(4): 935-49.
[84]
Wollenberg KR, Atchley WR. Separation of phylogenetic and functional associations in biological sequences by using the parametric bootstrap. Proc Natl Acad Sci USA 2000; 97(7): 3288-91.
[85]
Atchley WR, Wollenberg KR, Fitch WM, Terhalle W, Dress AW. Correlations among amino acid sites in bHLH protein domains: an information theoretic analysis. Mol Biol Evol 2000; 17(1): 164-78.
[86]
Tillier ER, Lui TW. Using multiple interdependency to separate functional from phylogenetic correlations in protein alignments. Bioinformatics 2003; 19(6): 750-5.
[87]
Martin LC, Gloor GB, Dunn SD, Wahl LM. Using information theory to search for co-evolving residues in proteins. Bioinformatics 2005; 21(22): 4116-24.
[88]
Gloor GB, Martin LC, Wahl LM, Dunn SD. Mutual information in protein multiple sequence alignments reveals two classes of coevolving positions. Biochemistry 2005; 44(19): 7156-65.
[89]
Dunn SD, Wahl LM, Gloor GB. Mutual information without the influence of phylogeny or entropy dramatically improves residue contact prediction. Bioinformatics 2008; 24(3): 333-40.
[90]
Lockless SW, Ranganathan R. Evolutionarily conserved pathways of energetic connectivity in protein families. Science 1999; 286(5438): 295-9.
[91]
Süel GM, Lockless SW, Wall MA, Ranganathan R. Evolutionarily conserved networks of residues mediate allosteric communication in proteins. Nat Struct Biol 2003; 10(1): 59-69.
[92]
Dekker JP, Fodor A, Aldrich RW, Yellen G. A perturbation-based method for calculating explicit likelihood of evolutionary co-variance in multiple sequence alignments. Bioinformatics 2004; 20(10): 1565-72.
[93]
Singer MS, Vriend G, Bywater RP. Prediction of protein residue contacts with a PDB-derived likelihood matrix. Protein Eng 2002; 15(9): 721-5.
[94]
Eyal E, Frenkel-Morgenstern M, Sobolev V, Pietrokovski S. A pair-to-pair amino acids substitution matrix and its applications for protein structure prediction. Proteins 2007; 67(1): 142-53.
[95]
Eyal E, Pietrokovski S, Bahar I. Rapid assessment of correlated amino acids from pair-to-pair (P2P) substitution matrices. Bioinformatics 2007; 23(14): 1837-9.
[96]
Lapedes A, Giraud B, Jarzynski C. Using sequence alignments to predict protein structure and stability with high accuracy. arXiv preprint arXiv:12072484 2012.
[97]
Marks DS, Colwell LJ, Sheridan R, et al. Protein 3D structure computed from evolutionary sequence variation. PLoS One 2011; 6(12): e28766.
[98]
Hopf TA, Colwell LJ, Sheridan R, Rost B, Sander C, Marks DS. Three-dimensional structures of membrane proteins from genomic sequencing. Cell 2012; 149(7): 1607-21.
[99]
Sułkowska JI, Morcos F, Weigt M, Hwa T, Onuchic JN. Genomics-aided structure prediction. Proc Natl Acad Sci USA 2012; 109(26): 10340-5.
[100]
Burger L, van Nimwegen E. Accurate prediction of protein-protein interactions from sequence alignments using a Bayesian method. Mol Syst Biol 2008; 4(1): 165.
[101]
Burger L, van Nimwegen E. Disentangling direct from indirect co-evolution of residues in protein alignments. PLOS Comput Biol 2010; 6(1): e1000633.
[102]
Kaján L, Hopf TA, Kalaš M, Marks DS, Rost B. FreeContact: fast and free software for protein contact prediction from residue co-evolution. BMC Bioinformatics 2014; 15(1): 85.
[103]
Seemayer S, Gruber M, Söding J. CCMpred--fast and precise prediction of protein residue-residue contacts from correlated mutations. Bioinformatics 2014; 30(21): 3128-30.
[104]
Fares MA, Travers SAA. A novel method for detecting intramolecular coevolution: adding a further dimension to selective constraints analyses. Genetics 2006; 173(1): 9-23.
[105]
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B. Support vector machines. IEEE Intell Syst Their Appl 1998; 13(4): 18-28.
[106]
Li D, Hu X, Liu X, Feng Z, Ding C. Using feature optimization-based support vector machine method to recognize the β-hairpin motifs in enzymes. Saudi J Biol Sci 2017; 24(6): 1361-9.
[107]
Lin H, Liang ZY, Tang H, Chen W. Identifying sigma70 promoters with novel pseudo nucleotide composition. IEEE/ACM Trans Comput Biol Bioinformatics 2017; 2666141
[http://dx.doi.org/10.1109/TCBB.2017]
[108]
Schmidhuber J. Deep learning in neural networks: an overview. Neural Netw 2015; 61: 85-117.
[109]
Eickholt J, Cheng J. A study and benchmark of DNcon: a method for protein residue-residue contact prediction using deep networks. BMC Bioinformatics 2013; 14(Suppl. 14): S12.
[110]
Eickholt J, Cheng J. Predicting protein residue-residue contacts using deep networks and boosting. Bioinformatics 2012; 28(23): 3066-72.
[111]
Wang S, Sun S, Li Z, Zhang R, Xu J. Accurate De Novo Prediction of Protein Contact Map by Ultra-Deep Learning Model. PLOS Comput Biol 2017; 13(1): e1005324.
[112]
Atchley WR, Zhao J, Fernandes AD, Drüke T. Solving the protein sequence metric problem. Proc Natl Acad Sci USA 2005; 6395-400.
[113]
Ding W, Xie J, Dai D, Zhang H, Xie H, Zhang W. CNNcon: improved protein contact maps prediction using cascaded neural networks. PLoS One 2013; 8(4): e61533.
[114]
Bacardit J, Widera P, Márquez-Chamorro A, Divina F, Aguilar-Ruiz JS, Krasnogor N. Contact map prediction using a large-scale ensemble of rule sets and the fusion of multiple predicted structural features. Bioinformatics 2012; 28(19): 2441-8.
[115]
Yang J, Jang R, Zhang Y, Shen H-B. High-accuracy prediction of transmembrane inter-helix contacts and application to GPCR 3D structure modeling. Bioinformatics 2013; 29(20): 2579-87.
[116]
Skwark MJ, Raimondi D, Michel M, Elofsson A. Improved contact predictions using the recognition of protein like contact patterns. PLOS Comput Biol 2014; 10(11): e1003889.
[117]
Remmert M, Biegert A, Hauser A, Söding J. HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment. Nat Methods 2011; 9(2): 173-5.
[118]
Hobohm U, Sander C. Enlarged representative set of protein structures. Protein Sci 1994; 3(3): 522-4.
[119]
Altschul SF, Madden TL, Schäffer AA, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997; 25(17): 3389-402.
[120]
Cheng J, Randall AZ, Sweredoski MJ, Baldi P. SCRATCH: a protein structure and structural feature prediction server. Nucleic Acids Res 2005; 33: 72-6.