Current Bioinformatics

Author(s): Zhongzheng Mao and Zhen Wei*

DOI: 10.2174/0115748936332078240826105023

DownloadDownload PDF Flyer Cite As
Evaluating the Reliability of Machine Learning Predictors in m6A-SNP Association Analysis: A Comparative Study Using m6A-QTL Data
  • * (Excluding Mailing and Handling)

Abstract

Introduction: N6-Methyladenosine (m6A) plays a crucial role in determining the fate of RNA after transcription. Understanding the downstream functions of individual m6A sites is of critical interest in epitranscriptomics. In published studies, two main approaches have been used to decipher the specific impact of m6A sites on gene expression and disease/traits: the m6A quantitative trait loci (m6A-QTL) and in-silico mutation prediction by Machine Learning (ML) models. However, earlier works still lack independent validation for the performance of ML-based methods.

Methods: In this study, we use m6A-QTL as ground truth to evaluate the outcomes of in-silico mutation models. We benchmark both the newly trained machine learning models using genomic or sequence features and the existing model inference results published in in-silico mutationdependent databases against m6A-QTL.

Results: We found that the consistency between in-silico mutation and m6A-QTL is weak, regardless of the ML algorithms and predictive features used. This trend was also consistent across multiple published databases based on in-silico mutation, including RMDisease2, m6AVar, and RMVar.

Conclusion: These results highlight the importance of critical empirical evaluations for ML models in future SNP-m6A association studies and suggest the need for more high-quality m6A-QTL experiments to guide model development.

Keywords: N6-methyladenosine, m6A, Epitranscriptomics, Functional annotation, m6A-QTL, in-silico mutation, Machine Learning.