Inhibition of the hepatitis C virus (HCV) non-structural protein 3 (NS3) serine protease by molecule inhibitors is an attractive strategy for the treatment of hepatitis C. We built four classification models based on a dataset of 413 HCV NS3 protease inhibitors using support vector machine method. The best performing model obtains the best prediction performance for the test set with prediction accuracy, sensitivity (SE), specificity (SP) and Matthews correlation coefficient (MCC) of 90.76%, 92.21%, 88.10% and 0.799, respectively. The number of rotatable bonds (NRotBond), charge and electronegativity related properties were found to be correlated with the bioactivity of the inhibitors. The ECFP_4 analyses of structural features were performed and it was found that the cyclopropyl with acylsulfonamide group was the unique substructure in the active inhibitors. The method with dataset split by Kohonen's self-organizing map and descriptors selected by SVMAttributeEval presented in this study can be employed in virtual screening for discovering novel inhibitors of HCV NS3 protease.
Keywords: Classification models, HCV NS3 serine protease inhibitor, support vector machine.