Abstract
Background: The utilization of genetic data to investigate biological problems has recently
become a vital approach. However, it is undeniable that the heterogeneity of original samples at the
biological level is usually ignored when utilizing genetic data. Different cell-constitutions of a sample
could differentiate the expression profile, and set considerable biases for downstream research. Matrix
factorization (MF) which originated as a set of mathematical methods, has contributed massively to
deconvoluting genetic profiles in silico, especially at the expression level.
Objective: With the development of artificial intelligence algorithms and machine learning, the
number of computational methods for solving heterogeneous problems is also rapidly abundant.
However, a structural view from the angle of using MF to deconvolute genetic data is quite limited.
This study was conducted to review the usages of MF methods on heterogeneous problems of genetic
data on expression level.
Methods: MF methods involved in deconvolution were reviewed according to their individual
strengths. The demonstration is presented separately into three sections: application scenarios, method
categories and summarization for tools. Specifically, application scenarios defined deconvoluting
problem with applying scenarios. Method categories summarized MF algorithms contributed to
different scenarios. Summarization for tools listed functions and developed web-servers over the latest
decade. Additionally, challenges and opportunities of relative fields are discussed.
Results and Conclusion: Based on the investigation, this study aims to present a relatively global
picture to assist researchers to achieve a quicker access of deconvoluting genetic data in silico, further
to help researchers in selecting suitable MF methods based on the different scenarios.
Keywords:
Matrix factorization, heterogenization, gene expression, deconvolution, computational method, cell type.
Graphical Abstract
[41]
Chou K-C. Progresses in predicting post-translational modification. Int J Pept Res Ther 2020; 26: 873-88.
[50]
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W. Applied linear statistical models: Irwin Chicago 1996.
[53]
Reddy MC, Dourish P, Pratt W, Eds. Coordinating heterogeneous work: information and representation in medical care. In:ECSCW. Dordrecht: Springer 2001; pp. 239-58.
[60]
Russell SJ, Norvig P. Artificial intelligence-a modern approach (3rd internat edn). Prentice Hall 2010.
[61]
Lawson CL, Hanson RJ. Solving least squares problems. Society for Industrial and Applied Mathematics 1995.
[62]
Mullen KM, Van Stokkum IH. nnls: The Lawson-Hanson algorithm for non-negative least squares (NNLS). R package version 2007.
[63]
Berkson J, Ed. Estimation by least squares and by maximum likelihood. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, Contributions to the Theory of Statistics 1956. The Regents of the University of California.
[64]
Henderson H. Encyclopedia of computer science and technology. Infobase Publishing 2009.
[70]
Leek JT, Johnson WE, Parker HS, et al. sva: Surrogate variable analysis. R package version 2017; 3: 882-3.
[73]
Hoyer PO. Non-negative matrix factorization with sparseness constraints. J Mach Learn Res 2004; 5: 1457-69.
[76]
Paatero P. The multilinear engine-a table-driven, least squares program for solving multilinear problems, including the n-way parallel factor analysis model. J Comput Graph Stat 1999; 8(4): 854-88.
[77]
Jolliffe I. Principal component analysis. Berlin: Springer 2011; pp. 1094-6.
[83]
Hyvarinen A. Ed. Fast ICA for noisy data using Gaussian moments. ISCAS'99 Proceedings of the 1999 IEEE International Symposium on Circuits and Systems VLSI (Cat No 99CH36349); 1999. May 30- June 2; Orlando, USA.
[87]
Woolley AW, Chabris CF, Pentland A, Hashmi N, Malone TW. Evidence for a collective intelligence factor in the performance of human groups. science 2010; 330(6004): 686-8.
[88]
Bartholomew DJ, Steele F, Galbraith J, Moustaki I. Analysis of multivariate social science data. Chapman and Hall 2008.
[94]
Lawrence I, Lin KJB. A concordance correlation coefficient to evaluate reproducibility 1989; 255-68.