M6AMRFS: Robust Prediction of N6-Methyladenosine Sites With Sequence-Based Features in Multiple Species

摘要

As one of the well-studied RNA methylation modifications, N6-methyladenosine (m 6 A) plays important roles in various biological progresses, such as RNA splicing and degradation, etc. Identification of m 6 A sites is fundamentally important for better understanding of their functional mechanisms. Recently, machine learning based prediction methods have emerged as an effective approach for fast and accurate identification of m 6 A sites. In this paper, we proposed “M6AMRFS”, a new machine learning based predictor for the identification of m 6 A sites. In this predictor, we exploited a new feature representation algorithm to encode RNA sequences with two feature descriptors (dinucleotide binary encoding and Local position-specific dinucleotide frequency), and used the F-score algorithm combined with SFS (Sequential Forward Search) to enhance the feature representation ability. To predict m 6 A sites, we employed the eXtreme Gradient Boosting (XGBoost) algorithm to build a predictive model. Benchmarking results showed that the proposed predictor is competitive with the state-of-the art predictors. Importantly, robust predictions for multiple species by our predictor demonstrate that our predictive models have strong generalization ability. To the best of our knowledge, M6AMRFS is the first tool that can be used for the identification of m 6 A sites in multiple species. To facilitate the use of our predictor, we have established a user-friendly webserver with the implementation of M6AMRFS, which is currently available in http//server.malab.cn/M6AMRFS/. We anticipate that it will be a useful tool for the relevant research of m 6 A sites.

出版物
In Frontiers in Bioengineering and Biotechnology