Identifying the disease-related genes of important human diseases from genomics can provide valuable clues for the discovery of potential therapeutic targets. However, discovering the disease-related genes by traditional biological experiments methods is usually laborious and time-consuming. Therefore, it is necessary to develop a powerful computational approach to improve the effectiveness of disease-related gene identification. In this study, multiple sequence features of known disease-related genes in 62 kinds of diseases were extracted, and then the selected features were further optimized and analyzed for disease-related genes prediction. The leave-one-out cross-validation tests demonstrated that 55% of the disease-related genes can be ranked within the top 10 of the prediction results, which confirmed the reliability of this multi-feature fusion approach.
Keywords: Disease-related gene, sequence features, usage bias, F-statistic