Identification of Membrane Protein Types Based Using Hypergraph
Neural Network

Weizhong      Lu; Meiling      Qian; Yu      Zhang; Hongjie      Wu; Yijie      Ding; Jiawei      Shen; Xiaoyi      Chen; Haiou      Li; Qiming      Fu

Abstract

Introduction: Membrane proteins play an important role in living organisms as one of the main components of biological membranes. The problem in membrane protein classification and prediction is an important topic of membrane proteomics research because the function of proteins can be quickly determined if membrane protein types can be discriminated.

Methods: Most current methods to classify membrane proteins are labor-intensive and require a lot of resources. In this study, five methods, Average Block (AvBlock), Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), Histogram of Orientation Gradient (HOG), and Pseudo-PSSM (PsePSSM), were used to extract features in order to predict membrane proteins on a large scale. Then, we combined the five obtained feature matrices and constructed the corresponding hypergraph association matrix. Finally, the feature matrices and hypergraph association matrices were integrated to identify the types of membrane proteins using a hypergraph neural network model (HGNN).

Results: The proposed method was tested on four membrane protein benchmark datasets to evaluate its performance. The results showed 92.8%, 88.6%, 88.2%, and 99.0% accuracy on each of the four datasets.

Conclusion: Compared to traditional machine learning classifier methods, such as Random Forest (RF), Support Vector Machine (SVM), etc. HGNN prediction performance was found to be better.

Keywords: Membrane proteins, hypergraph neural network, multi-feature fusion, position-specific scoring matrix, Histogram of Orientation Gradient (HOG), Pseudo-PSSM (PsePSSM).

[1]
Chou KC, Elrod DW. Prediction of membrane protein types and subcellular locations. Proteins  1999; 34(1): 137-53.
 [http://dx.doi.org/10.1002/(SICI)1097-0134(19990101)34:1<137:AID-PROT11>3.0.CO;2-O] [PMID:  10336379]

[2]
Cai YD, Zhou GP, Chou KC. Support vector machines for predicting membrane protein types by using functional domain composition. Biophys J  2003; 84(5): 3257-63.
 [http://dx.doi.org/10.1016/S0006-3495(03)70050-2] [PMID:  12719255]

[3]
Cai YD, Chou KC. Predicting membrane protein type by functional domain composition and pseudo-amino acid composition. J Theor Biol  2006; 238(2): 395-400.
 [http://dx.doi.org/10.1016/j.jtbi.2005.05.035] [PMID:  16040052]

[4]
Chou KC, Shen HB. MemType-2L: A Web server for predicting membrane proteins and their types by incorporating evolution infor-mation through Pse-PSSM. Biochem Biophys Res Commun  2007; 360(2): 339-45.
 [http://dx.doi.org/10.1016/j.bbrc.2007.06.027] [PMID:  17586467]

[5]
Liu H, Yang J, Wang M, Xue L, Chou KC. Using fourier spectrum analysis and pseudo amino acid composition for prediction of mem-brane protein types. Protein J  2005; 24(6): 385-9.
 [http://dx.doi.org/10.1007/s10930-005-7592-4] [PMID:  16323044]

[6]
Shen H, Chou KC. Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict mem-brane protein types. Biochem Biophys Res Commun  2005; 334(1): 288-92.
 [http://dx.doi.org/10.1016/j.bbrc.2005.06.087] [PMID:  16002049]

[7]
Shen HB, Yang J, Chou KC. Fuzzy KNN for predicting membrane protein types from pseudo-amino acid composition. J Theor Biol  2006; 240(1): 9-13.
 [http://dx.doi.org/10.1016/j.jtbi.2005.08.016] [PMID:  16197963]

[8]
Wang M, Yang J, Liu GP, Xu ZJ, Chou KC. Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition. Protein Eng Des Sel  2004; 17(6): 509-16.
 [http://dx.doi.org/10.1093/protein/gzh061] [PMID:  15314209]

[9]
Liu H, Wang M, Chou KC. Low-frequency Fourier spectrum for predicting membrane protein types. Biochem Biophys Res Commun  2005; 336(3): 737-9.
 [http://dx.doi.org/10.1016/j.bbrc.2005.08.160] [PMID:  16140260]

[10]
Wang SQ, Yang J, Chou KC. Using stacked generalization to predict membrane protein types based on pseudo-amino acid composition. J Theor Biol  2006; 242(4): 941-6.
 [http://dx.doi.org/10.1016/j.jtbi.2006.05.006] [PMID:  16806277]

[11]
Chen YK, Li KB. Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou’s pseudo amino acid composition. J Theor Biol  2013; 318: 1-12.
 [http://dx.doi.org/10.1016/j.jtbi.2012.10.033] [PMID:  23137835]

[12]
Han GS, Yu ZG, Anh V. A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou’s PseAAC. J Theor Biol  2014; 344: 31-9.
 [http://dx.doi.org/10.1016/j.jtbi.2013.11.017] [PMID:  24316387]

[13]
Hayat M, Khan A. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition. J Theor Biol  2011; 271(1): 10-7.
 [http://dx.doi.org/10.1016/j.jtbi.2010.11.017] [PMID:  21110985]

[14]
Hayat M, Khan A, Yeasin M. Prediction of membrane proteins using split amino acid and ensemble classification. Amino Acids  2012; 42(6): 2447-60.
 [http://dx.doi.org/10.1007/s00726-011-1053-5] [PMID:  21850437]

[15]
Rezaei MA, Abdolmaleki P, Karami Z, et al. Prediction of membrane protein types by means of wavelet analysis and cascaded neural networks. J Theor Biol  2008; 254(4): 817-20.
 [http://dx.doi.org/10.1016/j.jtbi.2008.07.012] [PMID:  18692511]

[16]
Shen Y, Tang J, Guo F. Identification of protein subcellular localization via integrating evolutionary and physicochemical information into Chou’s general PseAAC. J Theor Biol  2019; 462: 230-9.
 [http://dx.doi.org/10.1016/j.jtbi.2018.11.012] [PMID:  30452958]

[17]
Wang Y, Ding Y, Guo F, Wei L, Tang J. Improved detection of DNA-binding proteins via compression technology on PSSM information. PLoS One  2017; 12(9)e0185587
 [http://dx.doi.org/10.1371/journal.pone.0185587] [PMID:  28961273]

[18]
Shen C, Ding Y, Tang J, Xu X, Guo F. An ameliorated prediction of drug–target interactions based on multi-scale discrete wavelet trans-form and network features. Int J Mol Sci  2017; 18(8): 1781.
 [http://dx.doi.org/10.3390/ijms18081781] [PMID:  28813000]

[19]
Ahmed N, Natarajan T, Rao KR. Discrete cosine transform. IEEE Trans Comput  1974; C-23(1): 90-3.
 [http://dx.doi.org/10.1109/T-C.1974.223784]

[20]
Ding Y, Tang J, Guo F. Identification of protein–protein interactions via a novel matrix-based sequence representation model with amino acid contact information. Int J Mol Sci  2016; 17(10): 1623.
 [http://dx.doi.org/10.3390/ijms17101623] [PMID:  27669239]

[21]
Boeckmann B, Bairoch A, Apweiler R, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res  2003; 31(1): 365-70.
 [http://dx.doi.org/10.1093/nar/gkg095] [PMID:  12520024]

[22]
Li W, Godzik A. Cd-hit: A fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformat  2006; 22(13): 1658-9.
 [http://dx.doi.org/10.1093/bioinformatics/btl158] [PMID:  16731699]

[23]
Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: Accelerated for clustering the next-generation sequencing data. Bioinformat  2012; 28(23): 3150-2.
 [http://dx.doi.org/10.1093/bioinformatics/bts565] [PMID:  23060610]

[24]
Xiaotong L, Xue-Wen C, Jeong JC, Lin X, Chen XW. On position-specific scoring matrix for protein function prediction. IEEE/ACM Trans Comput Biol Bioinformat  2011; 8(2): 308-15.
 [http://dx.doi.org/10.1109/TCBB.2010.93]

[25]
Nanni L, Brahnam S, Lumini A. Wavelet images and Chou’s pseudo amino acid composition for protein classification. Amino Acids  2012; 43(2): 657-65.
 [http://dx.doi.org/10.1007/s00726-011-1114-9] [PMID:  21993538]

[26]
Zhou D, Huang J, Schölkopf B. Learning with hypergraphs: Clustering, classification, and embedding. Adv Neural Inf Process Syst.   2006; 19: pp. 1601-8.
 [http://dx.doi.org/10.5555/2976456.2976657]

[27]
Huang Y, Liu Q, Metaxas D. Video object segmentation by hypergraph cut.  2009 IEEE conference on computer vision and pattern recognition.   2009; pp. 1738-45.
 [http://dx.doi.org/10.1109/CVPR.2009.5206795]

[28]
Huang Y, Liu Q, Zhang S, Metaxas DN. Image retrieval via probabilistic hypergraph ranking. In 2010 IEEE computer society conference on computer vision and pattern recognition  2010; 3376-83.
 [http://dx.doi.org/10.1109/CVPR.2010.5540012]

[29]
Yue G, Meng W, Zheng-Jun Z, Jialie S, Xuelong L, Xindong W. Visual-textual joint relevance learning for tag-based social image search. IEEE Trans Image Process  2013; 22(1): 363-76.
 [http://dx.doi.org/10.1109/TIP.2012.2202676] [PMID:  22692911]

[30]
Hwang T, Tian Z, Kuangy R, Kocher JP. Learning on weighted hypergraphs to integrate protein interactions and gene expressions for cancer outcome prediction. 2008 8th IEEE International Conference on Data Mining  2008; 293-302.
 [http://dx.doi.org/10.1109/ICDM.2008.37]

[31]
Gao Y, Wang M, Tao D, Ji R, Dai Q. 3-D object retrieval and recognition with hypergraph analysis. IEEE Trans Image Process  2012; 21(9): 4290-303.
 [http://dx.doi.org/10.1109/TIP.2012.2199502] [PMID:  22614650]

[32]
Feng Y, You H, Zhang Z, Ji R, Gao Y. Hypergraph neural networks. Proc Conf AAAI Artif Intell  2019; 33(1): 3558-65.
 [http://dx.doi.org/10.1609/aaai.v33i01.33013558]

[33]
Henaff M, Bruna J, LeCun Y. Deep convolutional networks on graph-structured data. arXiv 2015; abs/1506.05163.
 [http://dx.doi.org/10.48550/arXiv.1506.05163]

[34]
Defferrard M, Bresson X, Vandergheynst P. Convolutional neural networks on graphs with fast localized spectral filtering.Adv Neural Inf Process Syst  2016; 29: pp. 3844-52.
 [http://dx.doi.org/10.5555/3157382.3157527]

[35]
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res  2014; 15(1): 1929-58.

[36]
Kingma DP, Ba J. A method for stochastic optimization. arXiv 2014; 1412-6980.

[37]
Alhamdoosh M, Wang D. Fast decorrelated neural network ensembles with random weights. Inf Sci  2014; 264: 104-17.
 [http://dx.doi.org/10.1016/j.ins.2013.12.016]

[38]
Chou KC. Prediction of protein cellular attributes using pseudo-amino acid composition. Proteins  2001; 43(3): 246-55.
 [http://dx.doi.org/10.1002/prot.1035] [PMID:  11288174]

[39]
Wang L, Yuan Z, Chen X, Zhou Z. The prediction of membrane protein types with NPE. IEICE Elect Exp  2010; 7(6): 397-402.
 [http://dx.doi.org/10.1587/elex.7.397]

[40]
Shen HB, Chou KC. Using ensemble classifier to identify membrane protein types. Amino Acids  2007; 32(4): 483-8.
 [http://dx.doi.org/10.1007/s00726-006-0439-2] [PMID:  17031474]

Cite As

Current Bioinformatics

Identification of Membrane Protein Types Based Using Hypergraph Neural Network

Abstract