Survey of Machine Learning Techniques for Prediction of the Isoform Specificity of Cytochrome P450 Substrates

Page: [229 - 235] Pages: 7

  • * (Excluding Mailing and Handling)

Abstract

Background: Determination or prediction of the Absorption, Distribution, Metabolism, and Excretion (ADME) properties of drug candidates and drug-induced toxicity plays crucial roles in drug discovery and development. Metabolism is one of the most complicated pharmacokinetic properties to be understood and predicted. However, experimental determination of the substrate binding, selectivity, sites and rates of metabolism is time- and recourse- consuming. In the phase I metabolism of foreign compounds (i.e., most of drugs), cytochrome P450 enzymes play a key role. To help develop drugs with proper ADME properties, computational models are highly desired to predict the ADME properties of drug candidates, particularly for drugs binding to cytochrome P450.

Objective: This narrative review aims to briefly summarize machine learning techniques used in the prediction of the cytochrome P450 isoform specificity of drug candidates.

Results: Both single-label and multi-label classification methods have demonstrated good performance on modelling and prediction of the isoform specificity of substrates based on their quantitative descriptors.

Conclusion: This review provides a guide for researchers to develop machine learning-based methods to predict the cytochrome P450 isoform specificity of drug candidates.

Keywords: Cytochrome P450, drug metabolism, isoform specificity, machine learning, single-label classification, multi-label classification.

Graphical Abstract

[1]
Cheng, F.; Li, W.; Liu, G.; Tang, Y. In silico ADMET prediction: Recent advances, current challenges and future trends. Curr. Top. Med. Chem., 2013, 13(11), 1273-1289.
[2]
Tyzack, J.D.; Mussa, H.Y.; Williamson, M.J.; Kirchmair, J.; Glen, R.C. Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers. J. Cheminform., 2014, 6, 29.
[3]
Nielsen, L.M.; Linnet, K.; Olsen, L.; Rydberg, P. Prediction of cytochrome p450 mediated metabolism of designer drugs. Curr. Top. Med. Chem., 2014, 14(11), 1365-1373.
[4]
Zaretzki, J.; Bergeron, C.; Huang, T.W.; Rydberg, P.; Swamidass, S.J.; Breneman, C.M. RS-WebPredictor: A server for predicting CYP-mediated sites of metabolism on drug-like molecules. Bioinformatics, 2013, 29(4), 497-498.
[5]
Lewis, D.F. Human cytochromes P450 associated with the phase 1 metabolism of drugs and other xenobiotics: A compilation of substrates and inhibitors of the CYP1, CYP2 and CYP3 families. Curr. Med. Chem., 2003, 10(19), 1955-1972.
[6]
Zheng, M.; Luo, X.; Shen, Q.; Wang, Y.; Du, Y.; Zhu, W.; Jiang, H. Site of metabolism prediction for six biotransformations mediated by cytochromes P450. Bioinformatics, 2009, 25(10), 1251-1258.
[7]
Li, L.; Xiong, Y.; Zhang, Z.Y.; Guo, Q.; Xu, Q.; Liow, H.H.; Zhang, Y.H.; Wei, D.Q. Improved feature-based prediction of SNPs in human cytochrome P450 enzymes. Interdiscip. Sci.: Comput. Life Sci., 2015, 7(1), 65-77.
[8]
Ingelman-Sundberg, M. The human genome project and novel aspects of cytochrome P450 research. Toxicol. Appl. Pharmacol., 2005, 207(2)(Suppl.), 52-56.
[9]
Preissner, S.; Kroll, K.; Dunkel, M.; Senger, C.; Goldsobel, G.; Kuzman, D.; Guenther, S.; Winnenburg, R.; Schroeder, M.; Preissner, R. SuperCYP: A comprehensive database on Cytochrome P450 enzymes including a tool for analysis of CYP-drug interactions. Nucleic Acids Res., 2010, 38(Database issue), D237-D243.
[10]
Sim, S.C.; Ingelman-Sundberg, M. The Human Cytochrome P450 (CYP) Allele Nomenclature website: A peer-reviewed database of CYP variants and their associated effects. Hum. Genomics, 2010, 4(4), 278-281.
[11]
Lewis, D.F.; Ito, Y. Human CYPs involved in drug metabolism: Structures, substrates and binding affinities. Expert Opin. Drug Metab. Toxicol., 2010, 6(6), 661-674.
[12]
Kesharwani, S.S.; Nandekar, P.P.; Pragyan, P.; Rathod, V.; Sangamwar, A.T. Characterization of differences in substrate specificity among CYP1A1, CYP1A2 and CYP1B1: An integrated approach employing molecular docking and molecular dynamics simulations. J. Mol. Recognit., 2016, 29(8), 370-390.
[13]
Shaikh, N.; Sharma, M.; Garg, P. Selective fusion of heterogeneous classifiers for predicting substrates of membrane transporters. J. Chem. Inf. Model., 2017, 57(3), 594-607.
[14]
Yap, C.W.; Chen, Y.Z. Prediction of cytochrome P450 3A4, 2D6, and 2C9 inhibitors and substrates by using support vector machines. J. Chem. Inf. Model., 2005, 45(4), 982-992.
[15]
Terfloth, L.; Bienfait, B.; Gasteiger, J. Ligand-based models for the isoform specificity of cytochrome P450 3A4, 2D6, and 2C9 substrates. J. Chem. Inf. Model., 2007, 47(4), 1688-1701.
[16]
Ramesh, M.; Bharatam, P.V. CYP isoform specificity toward drug metabolism: Analysis using common feature hypothesis. J. Mol. Model., 2012, 18(2), 709-720.
[17]
Nembri, S.; Grisoni, F.; Consonni, V.; Todeschini, R. In silico prediction of cytochrome P450-Drug Interaction: QSARs for CYP3A4 and CYP2C9. Int. J. Mol. Sci., 2016, 17(6), pii E914.
[18]
Mishra, N.K.; Agarwal, S.; Raghava, G.P. Prediction of cytochrome P450 isoform responsible for metabolizing a drug molecule. BMC Pharmacol., 2010, 10, 8.
[19]
Yamashita, F.; Hara, H.; Ito, T.; Hashida, M. Novel hierarchical classification and visualization method for multiobjective optimization of drug properties: Application to structure-activity relationship analysis of cytochrome P450 metabolism. J. Chem. Inf. Model., 2008, 48(2), 364-369.
[20]
Michielan, L.; Terfloth, L.; Gasteiger, J.; Moro, S. Comparison of multilabel and single-label classification applied to the prediction of the isoform specificity of cytochrome p450 substrates. J. Chem. Inf. Model., 2009, 49(11), 2588-2605.
[21]
Zhang, T.; Dai, H.; Liu, L.A.; Lewis, D.F.V.; Wei, D.Q. Classification models for predicting cytochrome P450 enzyme-substrate selectivity. Mol. Inform., 2012, 31(1), 53-62.
[22]
Zhang, W.; Qu, Q.L.; Zhang, Y.Q.; Wang, W. The linear neighborhood propagation method for predicting long non-coding RNA - protein interactions. Neurocomputing, 2018, 273, 526-534.
[23]
Zhu, X.; Xiong, Y.; Kihara, D. Large-scale binding ligand prediction by improved patch-based method Patch-Surfer2.0. Bioinformatics, 2015, 31(5), 707-713.
[24]
Zhang, W.; Xiong, Y.; Zhao, M.; Zou, H.; Ye, X.; Liu, J. Prediction of conformational B-cell epitopes from 3D structures by random forests with a distance-based feature. BMC Bioinformatics, 2011, 12, 341.
[25]
Chen, W.; Yang, H.; Feng, P.; Ding, H.; Lin, H. iDNA4mC: Identifying DNA N4-methylcytosine sites based on nucleotide chemical properties. Bioinformatics, 2017, 33(22), 3518-3523.
[26]
Chen, W.; Feng, P.; Yang, H.; Ding, H.; Lin, H.; Chou, K.C. iRNA-AI: Identifying the adenosine to inosine editing sites in RNA sequences. Oncotarget, 2017, 8(3), 4208-4217.
[27]
Zhang, W.; Chen, Y.; Liu, F.; Luo, F.; Tian, G.; Li, X. Predicting potential drug-drug interactions by integrating chemical, biological, phenotypic and network data. BMC Bioinformatics, 2017, 18(1), 18.
[28]
Rudik, A.; Dmitriev, A.; Lagunin, A.; Filimonov, D.; Poroikov, V. SOMP: Web server for in silico prediction of sites of metabolism for drug-like compounds. Bioinformatics, 2015, 31(12), 2046-2048.
[29]
Zhang, W.; Liu, X.; Chen, Y.; Wu, W.; Wang, W.; Li, X. Feature-derived graph regularized matrix factorization for predicting drug side effects. Neurocomputing, 2018, 287, 154-162.
[30]
Peng, S.; You, R.; Wang, H.; Zhai, C.; Mamitsuka, H.; Zhu, S. DeepMeSH: Deep semantic representation for improving large-scale MeSH indexing. Bioinformatics, 2016, 32(12), i70-i79.
[31]
Liu, K.; Peng, S.; Wu, J.; Zhai, C.; Mamitsuka, H.; Zhu, S. MeSHLabeler: improving the accuracy of large-scale MeSH indexing by integrating diverse evidence. Bioinformatics, 2015, 31(12), i339-i347.
[32]
Wei, Y.Q.; Bi, D.X.; Wei, D.Q.; Ou, H.Y. Prediction of type ii toxin-antitoxin loci in Klebsiella pneumoniae genome sequences. Interdiscip. Sci.: Comput. Life Sci., 2016, 8(2), 143-149.
[33]
Zhang, W.; Yue, X.; Lin, W.; Wu, W.; Liu, R.; Huang, F.; Liu, F. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC Bioinformatics, 2018, 19(1), 233.
[34]
Zhang, W.; Yue, X.; Liu, F.; Chen, Y.; Tu, S.; Zhang, X. A unified frame of predicting side effects of drugs by using linear neighborhood similarity. BMC Syst. Biol., 2017, 11(Suppl. 6), 101.
[35]
Zhang, W.; Zou, H.; Luo, L.; Liu, Q.; Wu, W.; Xiao, W. Predicting potential side effects of drugs by recommender methods and ensemble learning. Neurocomputing, 2016, 173, 979-987.
[36]
Hoffmann, M.F.; Preissner, S.C.; Nickel, J.; Dunkel, M.; Preissner, R.; Preissner, S. The Transformer database: biotransformation of xenobiotics. Nucleic Acids Res., 2014, 42(Database issue), D1113-D1117.
[37]
Mak, L.; Marcus, D.; Howlett, A.; Yarova, G.; Duchateau, G.; Klaffke, W.; Bender, A.; Glen, R.C. Metrabase: A cheminformatics and bioinformatics database for small molecule transporter data analysis and(Q)SAR modeling. J. Cheminform., 2015, 7, 31.
[38]
O Boyle, N.M.; Banck, M.; James, C.A.; Morley, C.; Vandermeersch, T.; Hutchison, G.R. Open Babel: An open chemical toolbox. J. Cheminform., 2011, 3, 33.
[39]
Hatherley, R.; Brown, D.K.; Musyoka, T.M.; Penkler, D.L.; Faya, N.; Lobb, K.A.; Tastan Bishop, O. SANCDB: A South African natural compound database. J. Cheminform., 2015, 7, 29.
[40]
Keum, J.; Yoo, S.; Lee, D.; Nam, H. Prediction of compound-target interactions of natural products using large-scale drug and protein information. BMC Bioinformatics, 2016, 17(Suppl. 6), 219.
[41]
Speck-Planche, A.; Cordeiro, M.N. Review of current chemoinformatic tools for modeling important aspects of CYPs-mediated drug metabolism. Integrating metabolism data with other biological profiles to enhance drug discovery. Curr. Drug Metab., 2014, 15(4), 429-440.
[42]
Marrero-Ponce, Y. Linear indices of the molecular pseudographs atom adjacency matrix: Definition, significance-interpretation, and application to QSAR analysis of flavone derivatives as HIV-1 integrase inhibitors. J. Chem. Inf. Model., 2004, 44(6), 2010-2026.
[43]
Shin, W.H.; Zhu, X.; Bures, M.G.; Kihara, D. Three-dimensional compound comparison methods and their application in drug discovery. Molecules, 2015, 20(7), 12841-12862.
[44]
Hu, B.; Zhu, X.; Monroe, L.; Bures, M.G.; Kihara, D. PL-PatchSurfer: A novel molecular local surface-based method for exploring protein-ligand interactions. Int. J. Mol. Sci., 2014, 15(9), 15122-15145.
[45]
Venkatraman, V.; Chakravarthy, P.R.; Kihara, D. Application of 3D Zernike descriptors to shape-based ligand similarity searching. J. Cheminform., 2009, 1, 19.
[46]
Zhu, X.; Shin, W.H.; Kim, H.; Kihara, D. Combined approach of patch-surfer and pl-patchsurfer for protein-ligand binding prediction in CSAR 2013 and 2014. J. Chem. Inf. Model., 2016, 56(6), 1088-1099.
[47]
Shin, W.H.; Bures, M.G.; Kihara, D. PatchSurfers: Two methods for local molecular property-based binding ligand prediction. Methods, 2016, 93, 41-50.
[48]
Xu, Q.; Xiong, Y.; Dai, H.; Kumari, K.M.; Xu, Q.; Ou, H.Y.; Wei, D.Q. PDC-SGB: Prediction of effective drug combinations using a stochastic gradient boosting algorithm. J. Theor. Biol., 2017, 417, 1-7.
[49]
Xiong, Y.; Liu, J.; Zhang, W.; Zeng, T. Prediction of heme binding residues from protein sequences with integrative sequence profiles. Proteome Sci., 2012, 10(Suppl. 1), S20.
[50]
Yao, Y.; Zhang, T.; Xiong, Y.; Li, L.; Huo, J.; Wei, D.Q. Mutation probability of cytochrome P450 based on a genetic algorithm and support vector machine. Biotechnol. J., 2011, 6(11), 1367-1376.
[51]
Xiong, Y.; Xia, J.; Zhang, W.; Liu, J. Exploiting a reduced set of weighted average features to improve prediction of DNA-binding residues from 3D structures. PLoS One, 2011, 6(12), e28440.
[52]
Niu, Y.; Zhang, W. Quantitative prediction of drug side effects based on drug-related features. Interdiscip. Sci.: Comput. Life Sci., 2017, 9(3), 434-444.
[53]
Feng, P.; Chen, W.; Lin, H. Identifying antioxidant proteins by using optimal dipeptide compositions. Interdiscip. Sci.: Comput. Life Sci., 2016, 8(2), 186-191.
[54]
Zou, Q.; Wan, S.; Ju, Y.; Tang, J.; Zeng, X. Pretata: Predicting TATA binding proteins with novel features and dimensionality reduction strategy. BMC Syst. Biol., 2016, 10(4), 114.
[55]
Zou, Q.; Zeng, J.; Cao, L.; Ji, R. A novel features ranking metric with application to scalable visual and bioinformatics data classification. Neurocomputing, 2016, 173, 346-354.
[56]
Yu, L.; Sun, X.; Tian, S.W.; Shi, X.Y.; Yan, Y.L. Drug and Nondrug classification based on deep learning with various feature selection strategies. Curr. Bioinform., 2018, 13(3), 253-259.
[57]
Qiao, Y.; Xiong, Y.; Gao, H.; Zhu, X.; Chen, P. Protein-protein interface hot spots prediction based on a hybrid feature selection strategy. BMC Bioinformatics, 2018, 19(1), 14.
[58]
Dai, H.; Xu, Q.; Xiong, Y.; Liu, W.L.; Wei, D.Q. Improved prediction of michaelis constants in CYP450-mediated reactions by resilient back propagation algorithm. Curr. Drug Metab., 2016, 17(7), 673-680.
[59]
Li, D.; Ju, Y.; Zou, Q. Protein folds prediction with hierarchical structured SVM. Curr. Proteomics, 2016, 13(2), 79-85.
[60]
Soyemi, J.; Isewon, I.; Oyelade, J.; Adebiyi, E. Inter-species/host-parasite protein interaction predictions reviewed. Curr. Bioinform., 2018, 13(4), 396-406.
[61]
Xia, J.F.; Zhao, X.M.; Song, J.; Huang, D.S. APIS: Accurate prediction of hot spots in protein interfaces by combining protrusion index with solvent accessibility. BMC Bioinformatics, 2010, 11(1), 174.
[62]
Xiong, Y.; Liu, J.; Wei, D.Q. An accurate feature-based method for identifying DNA-binding residues on protein surfaces. Proteins, 2011, 79(2), 509-517.
[63]
Sun, Y.; Xiong, Y.; Xu, Q.; Wei, D. A hadoop-based method to predict potential effective drug combination. BioMed Res. Int., 2014, 2014, 196858.
[64]
Wang, W.; Liu, J.; Xiong, Y.; Zhu, L.; Zhou, X. Analysis and classification of DNA-binding sites in single-stranded and double-stranded DNA-binding proteins using protein information. IET Syst. Biol., 2014, 8(4), 176-183.
[65]
Korolev, D.; Balakin, K.V.; Nikolsky, Y.; Kirillov, E.; Ivanenkov, Y.A.; Savchuk, N.P.; Ivashchenko, A.A.; Nikolskaya, T. Modeling of human cytochrome p450-mediated drug metabolism using unsupervised machine learning approach. J. Med. Chem., 2003, 46(17), 3631-3643.
[66]
Zou, Q.; Chen, W.; Huang, Y.; Liu, X.; Jiang, Y. Identifying multi-functional enzyme by hierarchical multi-label classifier. J. Comput. Theor. Nanosci., 2013, 10(4), 1038-1043.
[67]
Zhang, W.; Zhu, X.; Fu, Y.; Tsuji, J.; Weng, Z. Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods. BMC Bioinformatics, 2017, 18(Suppl. 13), 464.
[68]
You, R.; Zhang, Z.; Xiong, Y.; Sun, F.; Mamitsuka, H.; Zhu, S. GOLabeler: Improving sequence-based large-scale protein function prediction by learning to rank. Bioinformatics, 2018, 34(14), 2465-2473.
[69]
Zhang, W.; Liu, F.; Luo, L.; Zhang, J. Predicting drug side effects by multi-label learning and ensemble learning. BMC Bioinformatics, 2015, 16, 365.
[70]
Yuan, Q.; Gao, J.; Wu, D.; Zhang, S.; Mamitsuka, H.; Zhu, S. DrugE-Rank: improving drug-target interaction prediction of new candidate drugs or targets by ensemble learning to rank. Bioinformatics, 2016, 32(12), i18-i27.
[71]
Zhang, M.L.; Zhou, Z.H. ML-KNN: A lazy learning approach to multi-label leaming. Pattern Recognit., 2007, 40(7), 2038-2048.
[72]
Lee, C.P.; Lin, C.J. Large-scale linear rankSVM. Neural Comput., 2014, 26(4), 781-817.
[73]
Chen, W.; Feng, P.M.; Lin, H.; Chou, K.C. iRSpot-PseDNC: Identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Res., 2013, 41(6), e68.
[74]
Chen, W.; Feng, P.; Lin, H. Prediction of ketoacyl synthase family using reduced amino acid alphabets. J. Ind. Microbiol. Biotechnol., 2012, 39(4), 579-584.
[75]
Bai, L.Y.; Dai, H.; Xu, Q.; Junaid, M.; Peng, S.L.; Zhu, X.; Xiong, Y.; Wei, D.Q. Prediction of Effective drug combinations by an improved naive bayesian algorithm. Int. J. Mol. Sci., 2018, 19(2), pii E467.
[76]
Chou, K.C. Some remarks on protein attribute prediction and pseudo amino acid composition. J. Theor. Biol., 2011, 273(1), 236-247.
[77]
Feng, P.; Zhang, J.; Tang, H.; Chen, W.; Lin, H. Predicting the organelle location of noncoding RNAs using pseudo nucleotide compositions. Interdiscip. Sci.: Comput. Life Sci., 2017, 9(4), 540-544.