Application of a Deep Matrix Factorization Model on Integrated Gene Expression Data

Page: [359 - 367] Pages: 9

  • * (Excluding Mailing and Handling)

Abstract

Background: Non-negative Matrix Factorization (NMF) has been extensively used in gene expression data. However, most NMF-based methods have single-layer structures, which may achieve poor performance for complex data. Deep learning, with its carefully designed hierarchical structure, has shown significant advantages in learning data features.

Objective: In bioinformatics, on the one hand, to discover differentially expressed genes in gene expression data; on the other hand, to obtain higher sample clustering results. It can provide the reference value for the prevention and treatment of cancer.

Method: In this paper, we apply a deep NMF method called Deep Semi-NMF on the integrated gene expression data. In each layer, the coefficient matrix is directly decomposed into the basic and coefficient matrix of the next layer. We apply this factorization model on The Cancer Genome Atlas (TCGA) genomic data.

Results: The experimental results demonstrate the superiority of Deep Semi-NMF method in identifying differentially expressed genes and clustering samples.

Conclusion: The Deep Semi-NMF model decomposes a matrix into multiple matrices and multiplies them to form a matrix. It can also improve the clustering performance of samples while digging out more accurate key genes for disease treatment.

Keywords: NMF, gene expression data, TCGA, deep semi-NMF, feature selection, clustering.

Graphical Abstract

[1]
Zhang Q, Sheng J. [Development and application of gene chip technology]. Zhongguo Yi Xue Ke Xue Yuan Xue Bao 2008; 30(3): 344-7.
[PMID: 18686620]
[2]
Wang Y, Zeng X, Iyer NJ, Bryant DW, Mockler TC, Mahalingam R. Exploring the switchgrass transcriptome using second-generation sequencing technology. PLoS One 2012; 7(3) e34225
[http://dx.doi.org/10.1371/journal.pone.0034225] [PMID: 22479570]
[3]
Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 2005; 3(2): 185-205.
[http://dx.doi.org/10.1142/S0219720005001004] [PMID: 15852500]
[4]
Tamborero D, Gonzalez-Perez A, Lopez-Bigas N. Oncodrive- CLUST: exploiting the positional clustering of somatic mutations to identify cancer genes. Bioinformatics 2013; 29(18): 2238-44.
[http://dx.doi.org/10.1093/bioinformatics/btt395] [PMID: 23884480]
[5]
Dai XH, Wang Z, Jiang P, Xia F, Sun YX. Survey on Intelligent Information Processing in Wireless Sensor Networks. Chuangan Jishu Xuebao 2006; 3794(9): 123-32.
[6]
Abdi H, Williams LJ. Principal component analysis. Wiley Interdiscip Rev Comput Stat 2010; 2(4): 433-59.
[http://dx.doi.org/10.1002/wics.101]
[7]
Skrobot VL, Castro EVR, Pereira RCC, Pasa VMD, Fortes ICP. Use of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) in Gas Chromatographic (GC) Data in the Investigation of Gasoline Adulteration. Energy Fuels 2007; 21(6): 3394-400.
[http://dx.doi.org/10.1021/ef0701337]
[8]
Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science 2000; 290(5500): 2323-6.
[http://dx.doi.org/10.1126/science.290.5500.2323] [PMID: 11125150]
[9]
Liu C. Comparative Assessment of Independent Component Analysis (ICA) for Face Recognition. Appears in the Second International Conference on Audio- and Video-based Biometric Person AuthenticationAVBPA’99. Washington D.C. USA 1999.
[10]
Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature 1999; 401(6755): 788-91.
[http://dx.doi.org/10.1038/44565] [PMID: 10548103]
[11]
Peharz R, Pernkopf F. Sparse nonnegative matrix factorization with ℓ(0)-constraints. Neurocomputing 2012; 80(1): 38-46.
[http://dx.doi.org/10.1016/j.neucom.2011.09.024] [PMID: 22505792]
[12]
Shen B, Liu BD, Wang Q, Ji R. Robust nonnegative matrix factorization via L1 norm regularization by multiplicative updating rules. IEEE International Conference on Image Processing (ICIP) 2014.
[13]
Dai LY, Chun-Mei F, Jin-Xing L, Chun-Hou Z, Jiguo Y, Mi-Xiao H. Robust nonnegative matrix factorization via joint graph laplacian and discriminative information for identifying differentially expressed genes. Complexity 2017; 14: 1-11.
[14]
Cai D, He X, Han J, Huang TS. Graph regularized nonnegative matrix factorization for data representation. IEEE Trans Pattern Anal Mach Intell 2011; 33(8): 1548-60.
[http://dx.doi.org/10.1109/TPAMI.2010.231] [PMID: 21173440]
[15]
Kong D, Ding C, Huang H. Robust nonnegative matrix factorization using l21-norm. Proceedings of the 20th ACM International Conference On Information And Knowledge Management 2011; 673-82.
[http://dx.doi.org/10.1145/2063576.2063676]
[16]
Long X, Lu H, Peng Y, Li W. Graph regularized discriminative non-negative matrix factorization for face recognition. Multimedia Tools Appl 2014; 72(3): 2679-99.
[http://dx.doi.org/10.1007/s11042-013-1572-z]
[17]
Ding C, Li T, Jordan MI. Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 2010; 32(1): 45-55.
[http://dx.doi.org/10.1109/TPAMI.2008.277] [PMID: 19926898]
[18]
Trigeorgis G, Bousmalis K, Zafeiriou S, Schuller BW. A deep matrix factorization method for learning attribute representations. IEEE Trans Pattern Anal Mach Intell 2017; 39(3): 417-29.
[http://dx.doi.org/10.1109/TPAMI.2016.2554555] [PMID: 28113886]
[19]
Lee D, Seung H. Algorithms for Non-negative Matrix Factorization. In: Proceedings of the 13th International Conference on Neural Information Processing Systems. 2001; pp. 556-62.
[20]
Wang L, Zhang Y, Feng J. On the Euclidean distance of images. IEEE Trans Pattern Anal Mach Intell 2005; 27(8): 1334-9.
[http://dx.doi.org/10.1109/TPAMI.2005.165] [PMID: 16119271]
[21]
Oh JH, Gao J, Rosenblatt K. Biological Data Outlier Detection Based on Kullback-Leibler Divergence. IEEE International Conference on Bioinformatics and Biomedicine USA 2008; 249-54.
[http://dx.doi.org/10.1109/BIBM.2008.76]
[22]
Seshadri V. The Inverse Gaussian Distribution: Statistical Theory and Applications. Technometrics 1999; 32(2): 235-5.
[http://dx.doi.org/10.1007/978-1-4612-1456-4]
[23]
Lizama C. The Poisson distribution, abstract fractional difference equations, and stability. Proc Am Math Soc 2017; 145: 3809-27.
[http://dx.doi.org/10.1090/proc/12895]
[24]
Nakatsukasa Y, Soma T. Finding a low-rank basis in a matrix subspace. Springer-Verlag: New York 2017.
[http://dx.doi.org/10.1007/s10107-016-1042-2]
[25]
Hall-Aspland SA, Hall AP, Rogers TL. A new approach to the solution of the linear mixing model for a single isotope: application to the case of an opportunistic predator. Oecologia 2005; 143(1): 143-7.
[http://dx.doi.org/10.1007/s00442-004-1783-0] [PMID: 15599768]
[26]
Wold S, Esbensen K, Geladi P. Principal component analysis. Chemom Intell Lab Syst 1987; 2(1): 37-52.
[http://dx.doi.org/10.1016/0169-7439(87)80084-9]
[27]
Roux JL, Hershey JR, Weninger F. Deep NMF for speech separation. IEEE International Conference on Acoustics 2015.