Current Bioinformatics

Author(s): Yongle Shi, Yibing Ma, Xiang Chen and Jie Gao*

DOI: 10.2174/0115748936334071240903064630

DownloadDownload PDF Flyer Cite As
scADCA: An Anomaly Detection-Based scRNA-seq Dataset Cell Type Annotation Method for Identifying Novel Cells
  • * (Excluding Mailing and Handling)

Abstract

Background: With the rapid evolution of single-cell RNA sequencing technology, the study of cellular heterogeneity in complex tissues has reached an unprecedented resolution. One critical task of the technology is cell-type annotation. However, challenges persist, particularly in annotating novel cell types.

Objective: Current methods rely heavily on well-annotated reference data, using correlation comparisons to determine cell types. However, identifying novel cells remains unstable due to the inherent complexity and heterogeneity of scRNA-seq data and cell types. To address this problem, we propose scADCA, a method based on anomaly detection, for identifying novel cell types and annotating the entire dataset.

Methods: The convolutional modules and fully connected networks are integrated into an autoencoder, and the reference dataset is trained to obtain the reconstruction errors. The threshold based on these errors can distinguish between novel and known cells in the query dataset. After novel cells are identified, a multinomial logistic regression model fully annotates the dataset.

Results: Using a simulation dataset, three real scRNA-seq pancreatic datasets, and a real scRNA-seq lung cancer cell line dataset, we compare scADCA with six other cell-type annotation methods, demonstrating competitive performance in terms of distinguished accuracy, full accuracy, F!-score, and confusion matrix.

Conclusion: In conclusion, the scADCA method can be further improved and expanded to achieve better performance and application effects in cell type annotation, which is helpful to improve the accuracy and reliability of cytology research and promote the development of single-cell omics.

Keywords: Anomaly detection, scRNA-seq data, cell type annotation, novel cell, autoencoder, logistic regression model.