Abstract
Background: CpG island (CGI) detection and methylation prediction play important roles in
studying the complex mechanisms of CGIs involved in genome regulation. In recent years, machine
learning (ML) has been gradually applied to CGI detection and CGI methylation prediction algorithms
in order to improve the accuracy of traditional methods. However, there are a few systematic reviews on
the application of ML in CGI detection and CGI methylation prediction. Therefore, this systematic review
aims to provide an overview of the application of ML in CGI detection and methylation prediction.
Methods: The review was carried out using the PRISMA guideline. The search strategy was applied to
articles published on PubMed from 2000 to July 10, 2022. Two independent researchers screened the
articles based on the retrieval strategies and identified a total of 54 articles. After that, we developed
quality assessment questions to assess study quality and obtained 46 articles that met the eligibility criteria.
Based on these articles, we first summarized the applications of ML methods in CGI detection and
methylation prediction, and then identified the strengths and limitations of these studies.
Result: Finally, we have discussed the challenges and future research directions.
Conclusion: This systematic review will contribute to the selection of algorithms and the future development
of more efficient algorithms for CGI detection and methylation prediction.
Graphical Abstract
[23]
Sutton RS, Barto AG. Reinforcement learning-an introduction cambridge, massachusetts. The MIT Press 2005.
[31]
Lai FL, Gao F. GC-Profile 2.0: An extended web server for the prediction and visualization of CpG islands. Bioinformatics 2021; 38(6): 1738-40.
[55]
Bäck T, Schwefel H-P. An overview of evolutionary algorithms for parameter optimization. Evol Comput 1993; 1(1): 1-23.
[62]
Xiao M, Li J, Hong S, et al. K-mer Counting: memory-efficient strategy, parallel computing and field of application for Bioinformatics IEEE Int Conf on Bioinformatic and Biomedicine. BIBM 2018; pp. 2561-7.
[90]
Joachims T. Making large-scale SVM learning practical: University of Dortmund Technical Report No 1998; 28: 1998.
[101]
Cutler A, Cutler D, Stevens J. Random Forests 2011; 45: 157-76.
[102]
Friedman J. Greedy function approximation: A gradient boosting machine. Ann Stat 2000; 29.
[109]
Tutsoy O. Pharmacological, non-pharmacological policies and mutation: An artificial intelligence based multi-dimensional policy making algorithm for controlling the casualties of the pandemic diseases. IEEE Trans Pattern Anal Mach Intell 2021; 44(12): 9477-88.
[110]
Tutsoy O, Balikci K, Ozdil NF. Unknown uncertainties in the COVID-19 pandemic: Multi-dimensional identification and mathematical modelling for the analysis and estimation of the casualties. Digit Signal Process 2021; 114: 103058.
[113]
El-Maarri O, Olek A, Balabau B, et al. Methylation levels at selected CpG sites in the factor VIII and FGFR3 genes, in mature female and male germ cells: Implications for male-driven evolution. Am J Hum Genet 1998; 63(4): 1001-8.