Background: RNA-binding proteins establish posttranscriptional gene regulation by coordinating maturation, editing, transport, stability, and translation of cellular RNAs. Immunoprecipitation experiments could identify the interaction between RNA and proteins, but they are limited due to the experimental environment and material. Therefore, it is essential to construct computational models to identify the function sites.
Objective: Although some computational methods have been proposed to predict RNA binding sites, the accuracy could be further improved. Moreover, it is necessary to construct a dataset with more samples to design a reliable model. Here we present a computational model based on multi-information sources to identify RNA binding sites.
Methods: We construct an accurate computational model named CSBPI_Site, based on extreme gradient boosting. The specifically designed 15-dimensional feature vector captures four types of information (chemical shift, chemical bond, chemical properties and position information).
Results: The satisfied accuracy of 0.86 and AUC of 0.89 were obtained by leave-one-out crossvalidation. Meanwhile, the accuracies were slightly different (range from 0.83 to 0.85) among the three classifiers algorithm, which showed that the novel features are stable and fit to multiple classifiers. These results showed that the proposed method is effective and robust for the identification of noncoding RNA binding sites.
Conclusion: Our method based on multi-information sources is effective to represent the binding sites information among ncRNAs. The satisfied prediction results of Diels-Alder riboz-yme based on CSBPI_Site indicates that our model is valuable to identify the function site.
Keywords: RNA binding site, multi-information sources, position information, chemical shift, chemical bond, chemical properties, extreme gradient boosting.