A Novel Method for Predicting Essential Proteins by Integrating Multidimensional Biological Attribute Information and Topological Properties

Page: [369 - 379] Pages: 11

  • * (Excluding Mailing and Handling)

Abstract

Background: Essential proteins are indispensable to the maintenance of life activities and play essential roles in the areas of synthetic biology. Identification of essential proteins by computational methods has become a hot topic in recent years because of its efficiency.

Objective: Identification of essential proteins is of important significance and practical use in the areas of synthetic biology, drug targets, and human disease genes.

Methods: In this paper, a method called EOP (Edge clustering coefficient -Orthologous-Protein) is proposed to infer potential essential proteins by combining Multidimensional Biological Attribute Information of proteins with Topological Properties of the protein-protein interaction network.

Results: The simulation results on the yeast protein interaction network show that the number of essential proteins identified by this method is more than the number identified by the other 12 methods (DC, IC, EC, SC, BC, CC, NC, LAC, PEC, CoEWC, POEM, DWE). Especially compared with DCDegree Centrality), the SN (sensitivity) is 9% higher, when the candidate protein is 1%, the recognition rate is 34% higher, when the candidate protein is 5%, 10%, 15%, 20%, 25% the recognition rate is 36%, 22%, 15%, 11%, 8% higher, respectively.

Conclusion: Experimental results show that our method can achieve satisfactory prediction results, which may provide references for future research.

Keywords: Protein-protein interaction network, essential proteins, biological attribute information, topological properties, accuracy of essential, edge clustering coefficient.

[1]
Zhang W, Xue X, Xie C, et al. CEGSO: Boosting essential proteins prediction by integrating protein complex, gene expression, gene ontol-ogy, subcellular localization and orthology information. Interdiscip Sci 2021; 13(3): 349-61.
[http://dx.doi.org/10.1007/s12539-021-00426-7] [PMID: 33772722]
[2]
Rahman A, Timmerman L, Gallardo F, Silvia T. Identification of essential protein domains from high-density transposon insertion se-quencing. Sci Reports 2022; 12: 962.
[http://dx.doi.org/10.21203/rs.3.rs-589027/v1]
[3]
Chen Z, Meng Z, Liu C, et al. A novel model for predicting essential proteins based on heterogeneous protein-domain network IEEE Access 2020; (99): 1
[http://dx.doi.org/10.1109/ACCESS.2020.2964571]
[4]
Lu P, Yu J. Two new methods for identifying essential proteins based on the protein complexes and topological properties IEEE Access 2020; (99): 1
[http://dx.doi.org/10.1109/ACCESS.2019.2963537]
[5]
Itaya M. An estimation of minimal genome size required for life. FEBS Lett 1995; 362(3): 257-60.
[http://dx.doi.org/10.1016/0014-5793(95)00233-Y] [PMID: 7729508]
[6]
Zhang Z, Ruan J, Gao J, Wu FX. Predicting essential proteins from protein-protein interactions using order statistics. J Theor Biol 2019; 480: 274-83.
[http://dx.doi.org/10.1016/j.jtbi.2019.06.022] [PMID: 31251944]
[7]
Li M, Zhang H, Wang JX, Pan Y. A new essential protein discovery method based on the integration of protein-protein interaction and gene expression data. BMC Syst Biol 2012; 6(1): 15.
[http://dx.doi.org/10.1186/1752-0509-6-15] [PMID: 22405054]
[8]
Bell D, Williams MD, El-Naggar AK. Pathology of head and neck tumors. Oncogene 2020. Available from: https://www.uptodate.com/contents/pathology-of-head-and-neckneoplasms
[9]
Ho Y, Gruhler A, Heilbut A, et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 2002; 415(6868): 180-3.
[http://dx.doi.org/10.1038/415180a] [PMID: 11805837]
[10]
Wang F, Han S, Yang J, Yan W, Hu G. Knowledge-guided “community network” analysis reveals the functional modules and candidate targets in non-small-cell lung cancer. Cells 2021; 10(2): 402.
[http://dx.doi.org/10.3390/cells10020402] [PMID: 33669233]
[11]
Payra AK, Saha B, Ghosh A. Ortho_Sim_Loc: Essential protein prediction using orthology and priority-based similarity approach. Comput Biol Chem 2021; 92: 107503.
[http://dx.doi.org/10.1016/j.compbiolchem.2021.107503] [PMID: 33962168]
[12]
Peng W, Wang J, Wang W, Liu Q, Wu FX, Pan Y. Iteration method for predicting essential proteins based on orthology and protein-protein interaction networks. BMC Syst Biol 2012; 6(1): 87.
[http://dx.doi.org/10.1186/1752-0509-6-87] [PMID: 22808943]
[13]
Deng YJ, Li YQ, Yin RR, et al. Efficient measurement model for critical nodes based on edge clustering coefficients and edge between-ness. Wirel Netw 2019; 26(1): 2785-95.
[14]
Jeong H, Mason SP, Barabási AL, Oltvai ZN. Lethality and centrality in protein networks. Nature 2001; 411(6833): 41-2.
[http://dx.doi.org/10.1038/35075138] [PMID: 11333967]
[15]
Hahn MW, Kern AD. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol Biol Evol 2005; 22(4): 803-6.
[http://dx.doi.org/10.1093/molbev/msi072] [PMID: 15616139]
[16]
Estrada E, Rodríguez-Velázquez JA. Subgraph centrality in complex networks. Phys Rev E Stat Nonlin Soft Matter Phys 2005; 71(5 Pt 2): 056103.
[http://dx.doi.org/10.1103/PhysRevE.71.056103] [PMID: 16089598]
[17]
Joy MP, Brock A, Ingber DE, Huang S. High-betweenness proteins in the yeast protein interaction network. J Biomed Biotechnol 2005; 2005(2): 96-103.
[http://dx.doi.org/10.1155/JBB.2005.96] [PMID: 16046814]
[18]
Narayanan S. The betweenness centrality of biological networks. Virgina Tech 2005. Available from: https://vtechworks.lib.vt.edu/handle/10919/35405
[19]
Bonacich P. Bonacich and phillip. power and centrality: A family of measures. Am J Sociol 1987; 92(5): 1170-82.
[http://dx.doi.org/10.1086/228631]
[20]
Benini L, Micheli GD. Networks on chip: A new SoC paradigm IEEE 2002; 35(1): 70-8.
[http://dx.doi.org/10.1109/2.976921]
[21]
Wuchty S, Stadler PF. Centers of complex networks. J Theor Biol 2003; 223(1): 45-53.
[http://dx.doi.org/10.1016/S0022-5193(03)00071-7] [PMID: 12782116]
[22]
Wang H, Li M, Wang J, Pan Y. A new method for identifying essential proteins based on edge clustering coefficient. Chen J, Wang J, Zelikovsky A, Eds. Bioinformatics Research and Applications Lecture Notes in Computer Science 2011. 6674
[http://dx.doi.org/10.1007/978-3-642-21260-4_12]
[23]
Hart GT, Lee I, Marcotte ER. A high-accuracy consensus map of yeast protein complexes reveals modular nature of gene essentiality. BMC Bioinformatics 2007; 8(1): 236.
[http://dx.doi.org/10.1186/1471-2105-8-236] [PMID: 17605818]
[24]
Zhang X, Xu J, Xiao WX. A new method for the discovery of essential proteins. PLoS One 2013; 8(3): e58763.
[http://dx.doi.org/10.1371/journal.pone.0058763] [PMID: 23555595]
[25]
Zhao B, Wang J, Li M, Wu FX, Pan Y. Prediction of essential proteins based on overlapping essential modules. IEEE Trans Nanobiosci 2014; 13(4): 415-24.
[http://dx.doi.org/10.1109/TNB.2014.2337912] [PMID: 25122840]
[26]
Xu B, Lin H, Chen Y, Yang Z, Liu H. Protein complex identification by integrating protein-protein interaction evidence from multiple sources. PLoS One 2013; 8(12): e83841.
[http://dx.doi.org/10.1371/journal.pone.0083841] [PMID: 24386289]
[27]
Ou-Yang L, Yan H, Zhang XF. A multi-network clustering method for detecting protein complexes from multiple heterogeneous networks. BMC Bioinformatics 2017; 18(S13)(Suppl. 13): 463.
[http://dx.doi.org/10.1186/s12859-017-1877-4] [PMID: 29219066]
[28]
Zhang W, Xu J, Li Y, Zou X. Correction to “detecting essential proteins based on network topology, gene expression data, and gene ontol-ogy information”. IEEE/ACM Trans Comput Biol Bioinformatics 2018; 15(3): 1035-5.
[http://dx.doi.org/10.1109/TCBB.2018.2813918] [PMID: 29877834]
[29]
Zhong J, Tang C, Peng W, et al. A novel essential protein identification method based on PPI networks and gene expression data. BMC Bioinformatics 2021; 22(1): 248.
[http://dx.doi.org/10.1186/s12859-021-04175-8] [PMID: 33985429]
[30]
Horyu D, Hayashi T. Comparison between pearson correlation coefficient and mutual information as a similarity measure of gene expres-sion profiles. Japanese J Biomet 2013; 33(2): 125-43.
[http://dx.doi.org/10.5691/jjb.33.125]
[31]
Tu BP, Kudlicki A, Rowicka M, McKnight SL, Tu PB. Logic of the yeast metabolic cycle: Temporal compartmentalization of cellular pro-cesses. Science 2005; 310(5751): 1152-8.
[http://dx.doi.org/10.1126/science.1120499] [PMID: 16254148]
[32]
Radicchi F, Castellano C, Cecconi F, Loreto V, Parisi D. Defining and identifying communities in networks. Proc Natl Acad Sci USA 2004; 101(9): 2658-63.
[http://dx.doi.org/10.1073/pnas.0400054101] [PMID: 14981240]
[33]
Xenarios I, Salwínski L, Duan XJ, Higney P, Kim SM, Eisenberg D. DIP, the database of interacting proteins: A research tool for studying cellular networks of protein interactions. Nucleic Acids Res 2002; 30(1): 303-5.
[http://dx.doi.org/10.1093/nar/30.1.303] [PMID: 11752321]
[34]
Gavin AC, Aloy P, Grandi P, et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 2006; 440(7084): 631-6.
[http://dx.doi.org/10.1038/nature04532] [PMID: 16429126]
[35]
Gabriel Ö, Schmitt T, Forslund K, et al. InParanoid 7: New algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res 2010; 38(Database issue): D196-203.
[36]
Saccharomyces genome deletion project. Available from: http://www-sequence.stanford.edu/group/yeast_deletion_project/deletion3.html
[37]
Cherry JM, Adler C, Ball C, et al. SGD: Saccharomyces genome database. Nucleic Acids Res 1998; 26(1): 73-9.
[http://dx.doi.org/10.1093/nar/26.1.73] [PMID: 9399804]
[38]
Zhang R, Lin Y. DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes. Nucleic Acids Res 2009; 37(Database issue): D455-8.
[http://dx.doi.org/10.1093/nar/gkn858] [PMID: 18974178]
[39]
Holman AG, Davis PJ, Foster JM, Carlow CK, Kumar S. Computational prediction of essential genes in an unculturable endosymbiotic bacterium, Wolbachia of Brugia malayi. BMC Microbiol 2009; 9(1): 243.
[http://dx.doi.org/10.1186/1471-2180-9-243] [PMID: 19943957]