Background: There are correlations between the multiple types of data stored in the tensor space. The matrix formed by the data in the high-dimensional space is of low rank. Therefore, the potential association between genes and cancers can be explored in low-rank space. Tensor robust principal component analysis (TRPCA) is used to extract information by obtaining coefficient tensors with low-rank representation. In practical applications, global features and sparse structure are ignored, which leads to incomplete analysis.
Objective: This paper proposes an adaptive reweighted TRPCA method (ARTRPCA) to explore cancer subtypes and identify conjoint abnormally expressed genes (CAEGs).
Methods: ARTRCA analyzes data based on adaptive learning of primary information. Meanwhile, the weighting scheme based on singular value updates is used to learn global features in low-rank space. The reweighted I1 algorithm is based on prior knowledge, which is used to learn about sparse structures. Moreover, the sparsity threshold of Gaussian entries has been increased to reduce the influence of outliers.
Results: In the experiment of sample clustering, ARTRPCA has obtained promising experimental results. The identified CAEGs are pathogenic genes of various cancers or are highly expressed in specific cancers.
Conclusion: The ATRPCA method has shown excellent application prospects in cancer multiomics data.
Keywords: Tensor singular values, global features, sparse structure, reweighted algorithm