DSAE-Impute: Learning Discriminative Stacked Autoencoders for Imputing Single-cell RNA-seq Data

Page: [440 - 451] Pages: 12

  • * (Excluding Mailing and Handling)

Abstract

Background: Due to the limited amount of mRNA in single-cell, there are always many missing values in scRNA-seq data, making it impossible to accurately quantify the expression of singlecell RNA. The dropout phenomenon makes it impossible to detect the truly expressed genes in some cells, which greatly affects the downstream analysis of scRNA-seq data, such as cell cluster analysis and cell development trajectories.

Objective: This research proposes an accurate deep learning method to impute the missing values in scRNA-seq data. DSAE-Impute employs stacked autoencoders to capture gene expression characteristics in the original missing data and combines the discriminative correlation matrix between cells to capture global expression features during the training process to accurately predict missing values.

Methods: We propose a novel deep learning model based on the discriminative stacked autoencoders to impute the missing values in scRNA-seq data, named DSAE-Impute. DSAE-Impute embeds the discriminative cell similarity to perfect the feature representation of stacked autoencoders and comprehensively learns the scRNA-seq data expression pattern through layer-by-layer training to achieve accurate imputation.

Results: We have systematically evaluated the performance of DSAE-Impute in the simulation and real datasets. The experimental results demonstrate that DSAE-Impute significantly improves downstream analysis, and its imputation results are more accurate than other state-of-the-art imputation methods.

Conclusion: Extensive experiments show that compared with other state-of-the-art methods, the imputation results of DSAE-Impute on simulated and real datasets are more accurate and helpful for downstream analysis.

Keywords: scRNA-seq, imputation, gene expression, dropout event, a discriminative stacked autoencoder, DSAE-Impute.

Graphical Abstract

[1]
Kolodziejczyk AA, Kim JK, Svensson V, Marioni JC, Teichmann SA. The technology and biology of single-cell RNA sequencing. Mol Cell 2015; 58(4): 610-20.
[http://dx.doi.org/10.1016/j.molcel.2015.04.005] [PMID: 26000846]
[2]
Paik DT, Cho S, Tian L, Chang HY, Wu JC. Single-cell RNA sequencing in cardiovascular development, disease and medicine. Nat Rev Cardiol 2020; 17(8): 457-73.
[http://dx.doi.org/10.1038/s41569-020-0359-y] [PMID: 32231331]
[3]
Potter SS. Single-cell RNA sequencing for the study of development, physiology and disease. Nat Rev Nephrol 2018; 14(8): 479-92.
[http://dx.doi.org/10.1038/s41581-018-0021-7] [PMID: 29789704]
[4]
Zhang Z, Cui F, Zhou M, Wu S, Zou Q, Gao B. Single-cell RNA sequencing analysis identifies key genes in brain metastasis from lung adenocarcinoma. Curr Gene Ther 2021; 21(4): 338-48.
[http://dx.doi.org/10.2174/1566523221666210319104752] [PMID: 33745433]
[5]
Zhang Z, Cui F, Lin C, Zhao L, Wang C, Zou Q. Critical downstream analysis steps for single-cell RNA sequencing data. Brief Bioinform 2021; 22(5)bbab105
[http://dx.doi.org/10.1093/bib/bbab105] [PMID: 33822873]
[6]
Wang Z-W, Chang C-C, Zou Q. COVID-19 related research by data mining in single cell transcriptome profiles. J Electron Sci Technol 2021; 19(1): 1-5.
[7]
Jaitin DA, Kenigsberg E, Keren-Shaul H, et al. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science 2014; 343(6172): 776-9.
[http://dx.doi.org/10.1126/science.1247651] [PMID: 24531970]
[8]
Shalek AK, Satija R, Adiconis X, et al. Single-cell transcriptomics reveals bimodality in expression and splicing in immune cells. Nature 2013; 498(7453): 236-40.
[http://dx.doi.org/10.1038/nature12172] [PMID: 23685454]
[9]
Villani AC, Satija R, Reynolds G, et al. Single-cell RNA-seq reveals new types of human blood dendritic cells, monocytes, and progeni-tors. Science 2017; 356(6335)eaah4573
[http://dx.doi.org/10.1126/science.aah4573] [PMID: 28428369]
[10]
Wang Z, Gerstein M, Snyder M. RNA-Seq: A revolutionary tool for transcriptomics. Nat Rev Genet 2009; 10(1): 57-63.
[http://dx.doi.org/10.1038/nrg2484] [PMID: 19015660]
[11]
Vallejos CA, Risso D, Scialdone A, Dudoit S, Marioni JC. Normalizing single-cell RNA sequencing data: Challenges and opportunities. Nat Methods 2017; 14(6): 565-71.
[http://dx.doi.org/10.1038/nmeth.4292] [PMID: 28504683]
[12]
Andrews TS, Hemberg M. Modelling dropouts allows for unbiased identification of marker genes in scRNASeq experiments. biorxiv 2016.2016.065094
[13]
Zhu X, Ching T, Pan X, Weissman SM, Garmire L. Detecting heterogeneity in single-cell RNA-Seq data by non-negative matrix factoriza-tion. PeerJ 2017; 5e2888
[http://dx.doi.org/10.7717/peerj.2888] [PMID: 28133571]
[14]
Pollen AA, Nowakowski TJ, Shuga J, et al. Low-coverage single-cell mRNA sequencing reveals cellular heterogeneity and activated signal-ing pathways in developing cerebral cortex. Nat Biotechnol 2014; 32(10): 1053-8.
[http://dx.doi.org/10.1038/nbt.2967] [PMID: 25086649]
[15]
Zhang Z, Cui F, Cao C, Wang Q, Zou QJC. Single-cell RNA analysis reveals the potential risk of organ-specific cell types vulnerable to SARS-CoV-2 infections. Comput Biol Med 2022; 140105092
[16]
Izonin I, Tkachenko R, Verhun V, Zub KJES, Technology IJ. An approach towards missing data management using improved GRNN-SGTM ensemble method. Eng Sci Technol Int J 2021; 24(3): 749-59.
[http://dx.doi.org/10.1016/j.jestch.2020.10.005]
[17]
Tkachenko R, Izonin I, Kryvinska N, Dronyuk I, Zub KJS. An approach towards increasing prediction accuracy for the recovery of miss-ing IoT data based on the GRNN-SGTM ensemble. Sensors (Basel) 2020; 20(9): 2625.
[http://dx.doi.org/10.3390/s20092625]
[18]
Izonin I, Tkachenko R, Kryvinska N, Zub K, Mishchuk O, Lisovych TJPCS. Recovery of incomplete IoT sensed data using high-performance extended-input neural-like structure.Proc Comput Sci. 2019; 160: pp. 521-6.
[19]
Izonin I, Kryvinska N, Tkachenko R, Zub KJPCS. An approach towards missing data recovery within IoT smart system.Proc Comput Sci. 2019; 155: pp. 11-8.
[20]
Saliba A-E, Westermann AJ, Gorski SA, Vogel J. Single-cell RNA-seq: Advances and future challenges. Nucleic Acids Res 2014; 42(14): 8845-60.
[http://dx.doi.org/10.1093/nar/gku555] [PMID: 25053837]
[21]
Qi R, Wu J, Guo F, Xu L, Zou Q. A spectral clustering with selfweighted multiple kernel learning method for single-cell RNA-seq data. Briefings Bioinf 2021; 22(4): bbaa216.
[http://dx.doi.org/10.1093/bib/bbaa216]
[22]
Qi R, Ma A, Ma Q, Zou Q. Clustering and classification methods for single-cell RNA-sequencing data. Brief Bioinform 2020; 21(4): 1196-208.
[http://dx.doi.org/10.1093/bib/bbz062] [PMID: 31271412]
[23]
van Dijk D, Sharma R, Nainys J, et al. Recovering gene interactions from single-cell data using data diffusion. Cell 2018; 174(3): 716-729.e27.
[http://dx.doi.org/10.1016/j.cell.2018.05.061] [PMID: 29961576]
[24]
Gong W, Kwak IY, Pota P, Koyano-Nakagawa N, Garry DJ. DrImpute: Imputing dropout events in single cell RNA sequencing data. BMC Bioinformatics 2018; 19(1): 220.
[http://dx.doi.org/10.1186/s12859-018-2226-y] [PMID: 29884114]
[25]
Wagner F, Yan Y, Yanai I. K-nearest neighbor smoothing for high-throughput single-cell RNA-Seq data. BioRxiv 2017.2017.217737
[http://dx.doi.org/10.1101/217737]
[26]
Ronen J, Akalin A. netSmooth: Network-smoothing based imputation for single cell RNA-seq. F1000 Res 2018; 7: 8.
[http://dx.doi.org/10.12688/f1000research.13511.3] [PMID: 29511531]
[27]
Tang W, Bertaux F, Thomas P, et al. bayNorm: Bayesian gene expression recovery, imputation and normalization for single-cell RNA-sequencing data. Bioinformatics 2020; 36(4): 1174-81.
[PMID: 31584606]
[28]
Huang M, Wang J, Torre E, et al. SAVER: Gene expression recovery for single-cell RNA sequencing. Nat Methods 2018; 15(7): 539-42.
[http://dx.doi.org/10.1038/s41592-018-0033-z] [PMID: 29941873]
[29]
Miao Z, Li J, Zhang X. Screcover: Discriminating true and false zeros in single-cell RNA-seq data for imputation. bioRxiv 2019.2019.665323
[http://dx.doi.org/10.1101/665323]
[30]
Shi J-Y, Huang H, Li J-X, et al. TMFUF: A triple matrix factorization-based unified framework for predicting comprehensive drug-drug interactions of new drugs. BMC Bioinformatics 2018; 19(Suppl. 14): 411.
[http://dx.doi.org/10.1186/s12859-018-2379-8] [PMID: 30453924]
[31]
Chen M, Zhou X. VIPER: Variability-preserving imputation for accurate gene expression recovery in single-cell RNA sequencing studies. Genome Biol 2018; 19(1): 196.
[http://dx.doi.org/10.1186/s13059-018-1575-1] [PMID: 30419955]
[32]
Feng X, Chen L, Wang Z, Li SC. I-Impute: A self-consistent method to impute single cell RNA sequencing data. bioRxiv 2019.2019.772723
[http://dx.doi.org/10.1101/772723]
[33]
Prabhakaran S, Azizi E, Carr A, Pe’er D. Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. JMLR Workshop Conf Proc 2016; 48: 1070-9.
[34]
Islam S, Zeisel A, Joost S, et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat Methods 2014; 11(2): 163-6.
[http://dx.doi.org/10.1038/nmeth.2772] [PMID: 24363023]
[35]
Linderman GC, Zhao J, Kluger Y. Zero-preserving imputation of scRNA-seq data using low-rank approximation. bioRxiv 2018.2018.397588
[http://dx.doi.org/10.1101/397588]
[36]
Mongia A, Sengupta D, Majumdar A. McImpute: Matrix completion based imputation for single cell RNA-seq data. Front Genet 2019; 10: 9.
[http://dx.doi.org/10.3389/fgene.2019.00009] [PMID: 30761179]
[37]
Zhang L, Zhang S. PBLR: An accurate single cell RNA-seq data imputation tool considering cell heterogeneity and prior expression level of dropouts bioRxiv 2018.2018.379883
[http://dx.doi.org/10.1101/379883]
[38]
Xu J, Cai L, Liao B, Zhu W, Yang J. CMF-Impute: An accurate imputation tool for single-cell RNA-seq data. Bioinformatics 2020; 36(10): 3139-47.
[http://dx.doi.org/10.1093/bioinformatics/btaa109] [PMID: 32073612]
[39]
Jin K, Ou-Yang L, Zhao XM, Yan H, Zhang XF. scTSSR: Gene expression recovery for single-cell RNA sequencing using two-side sparse self-representation. Bioinformatics 2020; 36(10): 3131-8.
[http://dx.doi.org/10.1093/bioinformatics/btaa108] [PMID: 32073600]
[40]
Chen C, Wu C, Wu L, Wang X, Deng M, Xi R. scRMD: Imputation for single cell RNA-seq data via robust matrix decomposition. Bioinformatics 2020; 36(10): 3156-61.
[http://dx.doi.org/10.1093/bioinformatics/btaa139] [PMID: 32119079]
[41]
Ye P, Ye W, Ye C, et al. ScHinter: Imputing dropout events for single-cell RNA-seq data with limited sample size. Bioinformatics 2020; 36(3): 789-97.
[PMID: 31392316]
[42]
Elyanow R, Dumitrascu B, Engelhardt BE, Raphael BJ. netNMF-sc: Leveraging gene-gene interactions for imputation and dimensionality reduction in single-cell expression analysis. Genome Res 2020; 30(2): 195-204.
[http://dx.doi.org/10.1101/gr.251603.119] [PMID: 31992614]
[43]
Talwar D, Mongia A, Sengupta D, Majumdar A. AutoImpute: Autoencoder based imputation of single-cell RNA-seq data. Sci Rep 2018; 8(1): 16329.
[http://dx.doi.org/10.1038/s41598-018-34688-x] [PMID: 30397240]
[44]
Eraslan G, Simon LM, Mircea M, Mueller NS, Theis FJ. Single-cell RNA-seq denoising using a deep count autoencoder. Nat Commun 2019; 10(1): 390.
[http://dx.doi.org/10.1038/s41467-018-07931-2] [PMID: 30674886]
[45]
Chi W, Deng M. Sparsity-penalized stacked denoising autoencoders for imputing single-cell RNA-Seq data. Genes (Basel) 2020; 11(5): 532.
[http://dx.doi.org/10.3390/genes11050532] [PMID: 32403260]
[46]
Lopez R, Regier J, Cole MB, Jordan MI, Yosef N. Deep generative modeling for single-cell transcriptomics. Nat Methods 2018; 15(12): 1053-8.
[http://dx.doi.org/10.1038/s41592-018-0229-2] [PMID: 30504886]
[47]
He Y, Yuan H, Wu C, Xie Z. DISC: A highly scalable and accurate inference of gene expression and structure for single-cell transcrip-tomes using semi-supervised deep learning. Genome Biol 2020; 21(1): 170.
[http://dx.doi.org/10.1186/s13059-020-02083-3] [PMID: 32650816]
[48]
Rao J, Zhou X, Lu Y, Zhao H, Yang Y. Imputing single-cell RNA-seq data by combining graph convolution and autoencoder neural net-works. bioRxiv 2020.2020.935296
[http://dx.doi.org/10.1101/2020.02.05.935296]
[49]
Arisdakessian C, Poirion O, Yunits B, Zhu X, Garmire LX. DeepImpute: An accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data. Genome Biol 2019; 20(1): 211.
[http://dx.doi.org/10.1186/s13059-019-1837-6] [PMID: 31627739]
[50]
Deng Y, Bao F, Dai Q, Wu LF, Altschuler SJ. Scalable analysis of cell-type composition from single-cell transcriptomics using deep recur-rent learning. Nat Methods 2019; 16(4): 311-4.
[http://dx.doi.org/10.1038/s41592-019-0353-7] [PMID: 30886411]
[51]
Amodio M, van Dijk D, Srinivasan K, et al. Exploring single-cell data with deep multitasking neural networks. Nat Methods 2019; 16(11): 1139-45.
[http://dx.doi.org/10.1038/s41592-019-0576-7] [PMID: 31591579]
[52]
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert JP. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat Commun 2018; 9(1): 284.
[http://dx.doi.org/10.1038/s41467-017-02554-5] [PMID: 29348443]
[53]
Hinton GE, Zemel RS. Autoencoders, minimum description length and Helmholtz free energy. Proc 6th Int Conf Neural Inform Proces Systems 1993. 3-10.
[54]
Paul A, Majumdar A, Mukherjee DP. Discriminative autoencoder. arXiv 2018 2018; 3049-53.
[http://dx.doi.org/10.1109/ICIP.2018.8451462]
[55]
Zheng GX, Terry JM, Belgrader P, et al. Massively parallel digital transcriptional profiling of single cells. Nat Commun 2017; 8(1): 14049.
[http://dx.doi.org/10.1038/ncomms14049] [PMID: 28091601]
[56]
Hrvatin S, Hochbaum DR, Nagy MA, et al. Single-cell analysis of experience-dependent transcriptomic states in the mouse visual cortex. Nat Neurosci 2018; 21(1): 120-9.
[http://dx.doi.org/10.1038/s41593-017-0029-5] [PMID: 29230054]
[57]
Zappia L, Phipson B, Oshlack A. Splatter: Simulation of single-cell RNA sequencing data. Genome Biol 2017; 18(1): 174.
[http://dx.doi.org/10.1186/s13059-017-1305-0] [PMID: 28899397]
[58]
Zhang Z, Cui F, Wang C, Zhao L, Zou Q. Goals and approaches for each processing step for single-cell RNA sequencing data Briefings. Bioinf 2021; 22(4): bbaa314.
[http://dx.doi.org/10.1093/bib/bbaa314]
[59]
Melsted P, Booeshaghi AS, Liu L, et al. Modular, efficient and constant-memory single-cell RNA-seq preprocessing. Nat Biotechnol 2021; 39(7): 813-8.
[http://dx.doi.org/10.1038/s41587-021-00870-2] [PMID: 33795888]
[60]
Chen G, Ning B, Shi T. Single-cell RNA-seq technologies and related computational data analysis. Front Genet 2019; 10: 317.
[http://dx.doi.org/10.3389/fgene.2019.00317] [PMID: 31024627]
[61]
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R. Dropout: A simple way to prevent neural networks from overfitting. J Mach Learn Res 2014; 15(1): 1929-58.
[62]
Traag VA, Waltman L, van Eck NJ. From Louvain to Leiden: Guaranteeing well-connected communities. Sci Rep 2019; 9(1): 5233.
[http://dx.doi.org/10.1038/s41598-019-41695-z] [PMID: 30914743]
[63]
Hubert L, Arabie P. Comparing partitions. J Classif 1985; 2(1): 193-218.
[http://dx.doi.org/10.1007/BF01908075]
[64]
Rousseeuw PJ. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math 1987; 20: 53-65.
[http://dx.doi.org/10.1016/0377-0427(87)90125-7]
[65]
Poirion O, Zhu X, Ching T, Garmire LX. Using single nucleotide variations in single-cell RNA-seq to identify subpopulations and geno-type-phenotype linkage. Nat Commun 2018; 9(1): 4892.
[http://dx.doi.org/10.1038/s41467-018-07170-5] [PMID: 30459309]
[66]
Fowlkes EB, Mallows CL. A method for comparing two hierarchical clusterings. J Am Stat Assoc 1983; 78(383): 553-69.
[http://dx.doi.org/10.1080/01621459.1983.10478008]