Chronic hepatitis C virus (HCV) infections are a significant health problem worldwide. The NS5B Polymerase of HCV plays a central role in virus replication and is a prime target for the discovery of new treatment options. The urgent need to develop novel anti-HCV agents has provided an impetus for understanding the structure-activity relationship of novel Hepatitis C virus (HCV) NS5B polymerase inhibitors. Towards this objective, multiple linear regression (MLR) and support vector machine (SVM) were used to develop quantitative structure-activity relationship (QSAR) models for a dataset of 34 Tetrahydrobenzothiophene derivatives. The statistical analysis showed that the models derived from both SVM (R2 = 0.9784, SE=0.2982, R2 cv = 0.92) and MLR (R2=0.9684, SE=0.1171, R2 cv= 0.955) have a good internal predictivity. The models were also validated using external test set validation and Y-scrambling, the results demonstrated that MLR has a significant predictive ability for the external dataset as compared to SVM. Also the model is found to yield reliable clues for further optimization of Tetrahydrobenzothiophene derivatives in the data set.
Keywords: Hepatitis C virus, NS5B polymerase inhibitors, QSAR, support vector machine, tetrahydrobenzothiophene.