Predicting Drug Side Effects with Compact Integration of Heterogeneous Networks

Page: [709 - 720] Pages: 12

  • * (Excluding Mailing and Handling)

Abstract

Background: The side effects of drugs are not only harmful to humans but also the major reasons for withdrawing approved drugs, bringing greater risks for pharmaceutical companies. However, detecting the side effects for a given drug via traditional experiments is time- consuming and expensive. In recent years, several computational methods have been proposed to predict the side effects of drugs. However, most of the methods cannot effectively integrate the heterogeneous properties of drugs.

Methods: In this study, we adopted a network embedding method, Mashup, to extract essential and informative drug features from several drug heterogeneous networks, representing different properties of drugs. For side effects, a network was also built, from where side effect features were extracted. These features can capture essential information about drugs and side effects in a network level. Drug and side effect features were combined together to represent each pair of drug and side effect, which was deemed as a sample in this study. Furthermore, they were fed into a random forest (RF) algorithm to construct the prediction model, called the RF network model.

Results: The RF network model was evaluated by several tests. The average of Matthews correlation coefficients on the balanced and unbalanced datasets was 0.640 and 0.641, respectively.

Conclusion: The RF network model was superior to the models incorporating other machine learning algorithms and one previous model. Finally, we also investigated the influence of two feature dimension parameters on the RF network model and found that our model was not very sensitive to these parameters.

Keywords: Drug discovery, drug side effect, network embedding method, mashup, heterogeneous network, random forest.

Graphical Abstract

[1]
Pauwels E, Stoven V, Yamanishi Y. Predicting drug side-effect profiles: a chemical fragment-based approach. BMC Bioinformatics 2011; 12: 169.
[2]
Sohn S, Kocher JPA, Chute CG, Savova GK. Drug side effect extraction from clinical narratives of psychiatry and psychology patients. Journal of the American Medical Informatics Association 2011; 18(Supplement_1): i144-9.
[3]
Mizutani S, Pauwels E, Stoven V, Goto S, Yamanishi Y. Relating drug-protein interaction network with drug side effects. Bioinformatics 2012; 28(18): i522-8.
[4]
Niu Y, Zhang W. Quantitative prediction of drug side effects based on drug-related features. Interdiscip Sci 2017; 9(3): 434-44.
[5]
Fukuzaki M, Seki M, Kashima H, Sese J, Eds. Side Effect Prediction Using Cooperative Pathways. IEEE International Conference on Bioinformatics and Biomedicine
[6]
Yamanishi Y, Pauwels E, Kotera M. Drug side-effect prediction based on the integration of chemical and biological spaces. J Chem Inf Model 2012; 52(12): 3284-92.
[7]
Zhao X, Chen L, Lu J. A similarity-based method for prediction of drug side effects with heterogeneous information. Math Biosci 2018; 306: 136-44.
[8]
Breiman L. Random forests. Mach Learn 2001; 45(1): 5-32.
[9]
Cho H, Berger B, Peng J. Compact integration of multi-network topology for functional analysis of genes. Cell Syst 2016; 3(6): 540-548.e5.
[10]
Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol 2010; 6: 343.
[11]
Weininger D. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. J Chem Inf Comput Sci 1988; 28(1): 31-6.
[12]
Available from: RDKit. Open-source cheminformatics http://www.rdkit.org
[13]
Rogers D, Hahn M. Extended-connectivity fingerprints. J Chem Inf Model 2010; 50(5): 742-54.
[14]
Hattori M, Okuno Y, Goto S, Kanehisa M. Development of a chemical structure comparison method for integrated analysis of chemical and genomic information in the metabolic pathways. J Am Chem Soc 2003; 125(39): 11853-65.
[15]
Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res 2017; 45(D1): D353-61.
[16]
Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000; 28(1): 27-30.
[17]
Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P. STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res 2008; 36(Database issue): D684-8.
[18]
Kuhn M, Szklarczyk D, Pletscher-Frankild S, et al. STITCH 4: integration of protein-chemical interactions with user data. Nucleic Acids Res 2014; 42(Database issue): D401-7.
[19]
Wishart DS, Knox C, Guo AC, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res 2008; 36(Database issue): D901-6.
[20]
Wishart DS, Knox C, Guo AC, et al. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res 2006; 34(Suppl. 1): D668-72.
[21]
Luo Y, Zhao X, Zhou J, et al. A network integration approach for drug-target interaction prediction and computational drug repositioning from heterogeneous information. Nat Commun 2017; 8(1): 573.
[22]
Wang R, Liu G, Wang C, Su L, Sun L. Predicting overlapping protein complexes based on core-attachment and a local modularity structure. BMC Bioinformatics 2018; 19(1): 305.
[23]
Schwartz GW, Petrovic J, Zhou Y, Faryabi RB. Differential Integration of Transcriptome and Proteome Identifies Pan-Cancer Prognostic Biomarkers. Front Genet 2018; 9: 205.
[24]
Tranchevent LC, Nazarov PV, Kaoma T, et al. Predicting clinical outcome of neuroblastoma patients using an integrative network-based approach. Biol Direct 2018; 13(1): 12.
[25]
Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics 2017; 18(Suppl. 16): 573.
[26]
Ma CY, Chen YPP, Berger B, Liao CS. Identification of protein complexes by integrating multiple alignment of protein interaction networks. Bioinformatics 2017; 33(11): 1681-8.
[27]
Köhler S, Bauer S, Horn D, Robinson PN. Walking the interactome for prioritization of candidate disease genes. Am J Hum Genet 2008; 82(4): 949-58.
[28]
Chen L, Liu T, Zhao X. Inferring anatomical therapeutic chemical (ATC) class of drugs using shortest path and random walk with restart algorithms. BBA - Molecular Basis of Disease 2018; 1864(6,Part B): 2228-40.
[29]
Chen L, Zhang Y-H, Zhang Z, Huang T, Cai Y-D. Inferring novel tumor suppressor genes with a protein-protein interac-tion network and network diffusion algorithms. Mol Ther Methods Clin Dev 2018; 10: 57-67.
[30]
Fernandez-Delgado M, Cernadas E, Barro S, Amorim D. Do we Need Hundreds of Classifiers to Solve Real World Classi-fication Problems? J Mach Learn Res 2014; 15: 3133-81.
[31]
Chen L, Chu C, Huang T, Kong X, Cai YD. Prediction and analysis of cell-penetrating peptides using pseudo-amino acid composition and random forest models. Amino Acids 2015; 47(7): 1485-93.
[32]
Kandaswamy KK, Chou K-C, Martinetz T, et al. AFP-Pred: A random forest approach for predicting antifreeze proteins from sequence-derived properties. J Theor Biol 2011; 270(1): 56-62.
[33]
Casanova R, Saldana S, Chew EY, Danis RP, Greven CM, Ambrosius WT. Application of random forests methods to diabetic retinopathy classification analyses. PLoS One 2014; 9(6)e98587
[34]
Pugalenthi G, Kandaswamy KK, Chou K-C, Vivekanandan S, Kolatkar P. RSARF: prediction of residue solvent accessibility from protein sequence using random forest method. Protein Pept Lett 2012; 19(1): 50-6.
[35]
Sprague B, Shi Q, Kim MT, et al. Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers. J Comput Aided Mol Des 2014; 28(6): 631-46.
[36]
Ijaz A. SUMOhunt: Combining Spatial Staging between Lysine and SUMO with Random Forests to Predict SUMOylation. ISRN Bioinform 2013.2013671269
[37]
Witten IH, Frank E, Eds. Data Mining:Practical Machine Learning Tools and Techniques. San Francisco: Morgan, Kaufmann 2005.
[38]
Kohavi R. Ed.A study of cross-validation and bootstrap for accuracy estimation and model selection.International joint Conference on artificial intelligence. In: Lawrence Erlbaum Associates Ltd;. 1995.
[39]
Chen L, Li J, Zhang Y-H, et al. Identification of gene expression signatures across different types of neural stem cells with the Monte-Carlo feature selection method. J Cell Biochem 2018; 119(4): 3394-403.
[40]
Chen L, Pan X, Hu X, et al. Gene expression differences among different MSI statuses in colorectal cancer. Int J Cancer 2018; 143(7): 1731-40.
[41]
Cai Y-D, Zhang S, Zhang Y-H, et al. Identification of the Gene Expression Rules That Define the Subtypes in Glioma. J Clin Med 2018; 7(10): 350.
[42]
Matthews BW. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochim Biophys Acta 1975; 405(2): 442-51.
[43]
Chen L, Chu C, Zhang Y-H, Zheng M-Y, Zhu L, Kong X, et al. Identification of Drug-Drug Interactions Using Chemical Interactions. Curr Bioinform 2017; 12(6): 526-34.
[44]
Chen L, Wang S, Zhang Y-H, Li J, Xing Z-H, Yang J, et al. Identify key sequence features to improve CRISPR sgRNA ef-ficacy IEEE Access 2017; 5: 26582-90.
[45]
Chen L, Wang S, Zhang Y-H, et al. Prediction of nitrated tyrosine residues in protein sequences by extreme learning machine and feature selection methods. Comb Chem High Throughput Screen 2018; 21(6): 393-402.
[46]
Sasaki Y. The truth of the f-measure Teach Tutor mater 2007; 1-5.
[47]
Powers D. Evaluation: From precision, recall and f-measure to roc., informedness, markedness & correlation. J Mach Learn Technol 2011; 2(1): 37-63.
[48]
Egan J. Signal Detection Theory and ROC Analysis. New York: Academic Press 1975.
[49]
Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory 1967; 13(1): 21-7.
[50]
Corinna Cortes VV. Support-vector networks. Mach Learn 1995; 20(3): 273-97.
[51]
Ting KM, Witten IH. Eds.Stacking bagged and dagged models. Fourteenth international Conference on Machine Learning. San Francisco, CA. . 1997.