Improved prediction of protein-protein interactions using novelnegative samples, features, and an ensemble classifier

摘要

Computational methods are employed in bioinformatics to predict protein–protein interactions (PPIs).PPIs and protein–protein non interactions (PPNIs) display different levels of development, and the number of PPIs is considerably greater than that of PPNIs. This significant difference in the number of PPIsand PPNIs increases the cost of constructing a balanced dataset. PPIs can be classified as either physicalor genetic. However, ready-made PPNI databases were proven only to have no physical interactions andwere not proven to have no genetic interactions. Hence, ready-made PPNI databases contain false negative non-interactions. In this study, two PPNI datasets were artificially generated from a PPI database.In contrast to various traditional PPI feature extraction methods based on sequential information, twotypes of novel feature extraction methods were proposed. One is based on secondary structure information, and the other is based on the physicochemical properties of proteins. The experimental resultsof the RandomPairs dataset validate the efficiency and effectiveness of the proposed prediction model.These results reveal the potential of constructing a PPI negative dataset to reduce false negatives. Relateddatasets, tools, and source codes are accessible at http//lab.malab.cn/soft/PPIPre/PPIPre.html.

出版物
In Artificial Intelligence in Medicine