A metabolic pathway is a series of biological processes providing necessary molecules and energies for an organism, which could be essential to the lives of the living organisms. Most metabolic pathways require the involvement of compounds and given a compound it is helpful to know what types of metabolic pathways the compound participates in. In this study, compounds are first represented by molecular fragments which are then delivered to a prediction engine called Sequential Minimal Optimization (SMO) for predictions. Maximum relevance and minimum redundancy (mRMR) and incremental feature selection are adopted to extract key features based on which an optimal prediction engine is established. The proposed method is effective comparing to the random forest, Dagging and a popular method that integrating chemical-chemical interactions and chemical-chemical similarities. We also make predictions using some compounds with unknown metabolic pathways and choose 17 compounds for analysis. The results indicate that the method proposed may become a useful tool in predicting and analyzing metabolic pathways.
Keywords: Compound, metabolic pathway, molecular fragment, minimum redundancy maximum relevance, incremental feature selection, sequential minimal optimization.