Iterative feature representation algorithm to improve the predictive performance of N 7 -methylguanosine sites

Abstract

N 7 -methylguanosine (m 7 G) is an important epigenetic modification, playing an essential role in gene expression regulation. Therefore, accurate identification of m 7 G modifications will facilitate revealing and in-depth understanding their potential functional mechanisms. Although high-throughput experimental methods are capable of precisely locating m 7 G sites, they are still cost ineffective. Therefore, it’s necessary to develop new methods to identify m 7 G sites. In this work, by using the iterative feature representation algorithm, we developed a machine learning based method, namely m7G-IFL, to identify m 7 G sites. To demonstrate its superiority, m7G-IFL was evaluated and compared with existing predictors. The results demonstrate that our predictor outperforms existing predictors in terms of accuracy for identifying m 7 G sites. By analyzing and comparing the features used in the predictors, we found that the positive and negative samples in our feature space were more separated than in existing feature space. This result demonstrates that our features extracted more discriminative information via the iterative feature learning process, and thus contributed to the predictive performance improvement.

Publication
In Briefings in Bioinformatics