Information of protein quaternary structure can help to understand the biological functions of proteins. Because wet-lab experiments are both time-consuming and costly, we adopt a novel computational approach to assign proteins into 10 kinds of quaternary structures. By coding each protein using its biochemical and physicochemical properties, feature selection was carried out using Incremental Feature Selection (IFS) method. The thus obtained optimal feature set consisted of 97 features, with which the prediction model was built. As a result, the overall prediction success rate is 74.90% evaluated by Jackknife test, much higher than the overall correct rate of a random guess 10% (1/10). The further feature analysis indicates that protein secondary structure is the most contributed feature in the prediction of protein quaternary structure.
Keywords: Biochemical properties, Incremental Feature Selection;, Maximum Relevance, Minimum Redundancy, Physicochemical properties, Protein quaternary structure, Jackknife test, SVM, FDOD, NNA, mRMR, MaxRel feature, Predator, PredAcc, Peng's study