Abstract
In the current state of genomics and biomedical research, the utilization of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) have emerged as paradigm shifters.
While traditional NGS DNA and RNA sequencing analysis pipelines have been sound in decoding
genetic information, the sequencing data’s volume and complexity have surged. There is a demand
for more efficient and accurate methods of analysis. This has led to dependency on AI/ML and DL
approaches. This paper highlights these tool approaches to ease combat the limitations and generate
better results, with the help of pipeline automation and integration of these tools into the NGS DNA
and RNA-seq pipeline we can improve the quality of research as large data sets can be processed using Deep Learning tools. Automation helps reduce labor-intensive tasks and helps researchers to focus on other frontiers of research. In the traditional pipeline all tasks from quality check to the variant
identification in the case of SNP detection take a huge amount of computational time and manually
the researcher has to input codes to prevent manual human errors, but with the power of automation,
we can run the whole process in comparatively lesser time and smoother as the automated pipeline
can run for multiple files instead of the one single file observed in the traditional pipeline. In conclusion, this review paper sheds light on the transformative impact of DL's integration into traditional
pipelines and its role in optimizing computational time. Additionally, it highlights the growing importance of AI-driven solutions in advancing genomics research and enabling data-intensive biomedical applications.
[29]
Yang L. FeatureCounts: An efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 2014; 30(7): 929-30.
[44]
Gao J. The cBio cancer genomics portal: An open platform for exploring multidimensional cancer genomics data. Cancer Discov 2012; 2(5): 401-4.
[45]
Shihao S. rMATS: Robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Biol Sci 2014.
[62]
Galaxy Community. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2022 update. Nucleic Acids Res 2022; 50(W1): W345–51.
[79]
le Cessie S, van Houwelingen JC. Ridge estimators in logistic regression. J R Stat Soc Ser C 1992; 41(1): 191-201.
[81]
Zeng P, Song X, Lensen A, Ou Y, Sun Y, Zhang M. Differentiable
genetic programming for high-dimensional symbolic regression.
arxiv 2023; 2023: 08915.
[82]
Patel H, Prajapati P. Study and analysis of decision tree based classification algorithms. Int J Comput Sci Eng 2018; 6: 74-8.
[83]
Evgeniou T, Pontil M. Support Vector Machines: Theory and Applications. Springer 2001.
[93]
Ali J, Khan R, Ahmad N, Maqsood I. Random forests and decision trees. IJCSI 2012; p. 9.
[100]
Howley T, Madden M, O’Connell ML, Ryder A. The effect of principal component analysis on machine learning accuracy with high dimensional spectral data. In: Knowledge-Based Systems. Elsevier 2005.
[101]
Mishra S, Sarkar U, Taraphder S, Datta S, Swain D, Saikhom R. Principal component analysis. Int J Livest Res 2017; 1.
[112]
O’Shea K, Nash R. An introduction to convolutional neural networks. ArXiv 2015; 2015.
[120]
Yang X, Xu X, Breuss MW, Antaki D, Ball LL, Chung C. DeepMosaic: Control-independent mosaic single nucleotide variant detection using deep convolutional neural networks bioRxiv 2021; 2021; 382473.
[129]
Williams AG, Thomas S, Wyman SK, Holloway AK. RNA-seq
Data: Challenges in and recommendations for experimental design
and analysis. Curr Protoc Hum Genet 2014; 83: 11.13.1-11.13.20