Accessing Public Compound Databases with KNIME

Page: [6444 - 6457] Pages: 14

  • * (Excluding Mailing and Handling)

Abstract

Background: The KNIME platform offers several tools for the analysis of chem- and pharmacoinformatics data. Unless one has sufficient in-house data available for the analysis of interest, it is necessary to fetch third party data into KNIME. Many data sources offer valuable data, but including this data in a workflow is not always straightforward.

Objective: Here we discuss different ways of accessing public data sources. We give an overview of KNIME nodes for different sources, with references to available example workflows. For data sources with no individual KNIME node available, we present a general approach of accessing a web interface via KNIME.

In addition, we discuss necessary steps before the data can be analysed, such as data curation, chemical standardisation and the merging of datasets.

Keywords: KNIME, database, data mining, web service, data curation, chemical standardization, REST, API.

[1]
Berthold, M.R.; Cebron, N.; Dill, F.; Gabriel, T.R.; Kötter, T.; Meinl, T.; Ohl, P.; Sieb, C.; Thiel, K.; Wiswedel, B. KNIME: The Konstanz Information Miner. In: Data Analysis; Springer Berlin Heidelberg, , 2008, pp. 319-326.
[http://dx.doi.org/10.1145/1656274.1656280]
[2]
Steinmetz, F.P.; Mellor, C.L.; Meinl, T.; Cronin, M.T.D. Screening chemicals for receptor-mediated toxicological and pharmacological endpoints: using public data to build screening tools within a KNIME workflow. Mol. Inform., 2015, 34(2-3), 171-178.
[http://dx.doi.org/10.1002/minf.201400188] [PMID: 27490039]
[3]
Montanari, F.; Zdrazil, B.; Digles, D.; Ecker, G.F. Selectivity profiling of BCRP versus P-gp inhibition: from automated collection of polypharmacology data to multi-label learning. J. Cheminform., 2016, 8, 7.
[http://dx.doi.org/10.1186/s13321-016-0121-y] [PMID: 26855674]
[4]
Digles, D.; Zdrazil, B.; Neefs, J-M.; Van Vlijmen, H.; Herhaus, C.; Caracoti, A.; Brea, J.; Roibás, B.; Loza, M.I.; Queralt-Rosinach, N.; Furlong, L.I.; Gaulton, A.; Bartek, L.; Senger, S.; Chichester, C.; Engkvist, O.; Evelo, C.T.; Franklin, N.I.; Marren, D.; Ecker, G.F.; Jacoby, E. Open PHACTS computational protocols for in silico target validation of cellular phenotypic screens: knowing the knowns. MedChemComm, 2016, 7(6), 1237-1244.
[http://dx.doi.org/10.1039/C6MD00065G] [PMID: 27774140]
[5]
Zdrazil, B.; Guha, R. The rise and fall of a scaffold: a trend analysis of scaffolds in the medicinal chemistry literature. J. Med. Chem., 2018, 61(11), 4688-4703.
[http://dx.doi.org/10.1021/acs.jmedchem.7b00954] [PMID: 29235859]
[6]
Türková, A.; Jain, S.; Zdrazil, B. Integrative data mining, scaffold analysis, and sequential binary classification models for exploring ligand profiles of hepatic organic anion transporting polypeptides. J. Chem. Inf. Model., 2019, 59(5), 1811-1825.
[http://dx.doi.org/10.1021/acs.jcim.8b00466] [PMID: 30372058]
[7]
Dalby, A.; Nourse, J.G.; Hounshell, W.D.; Gushurst, A.K.I.; Grier, D.L.; Leland, B.A.; Laufer, J. Description of several chemical structure file formats used by computer programs developed at molecular design limited. J. Chem. Inf. Model., 1992, 32(3), 244-255.
[http://dx.doi.org/10.1021/ci00007a012]
[8]
Bray, T.; Maler, E.; Yergeau, F.; Sperberg-McQueen, M.; Paoli, J. Extensible Markup Language (XML) 1.0 (Fifth Edition); W3C, 2008.
[9]
Bray, T. The JavaScript Object Notation (JSON) Data interchange format; RFC Editor/ RFC Editor, 2017.
[10]
OPS-Knime, OPEN PHACTS, 2012. Available at: https://github.com/openphacts/OPS-Knime (Accessed Date 17.02.2020.)
[11]
Chichester, C.; Digles, D.; Siebes, R.; Loizou, A.; Groth, P.; Harland, L. Drug discovery FAQs: workflows for answering multidomain drug discovery questions. Drug Discov. Today, 2015, 20(4), 399-405.
[http://dx.doi.org/10.1016/j.drudis.2014.11.006] [PMID: 25463038]
[12]
Groth, P.; Loizou, A.; Gray, A.J. API-centric linked data integration: the open PHACTS discovery platform case study. J. Web Semant., 2014, 29, 12-18.
[http://dx.doi.org/10.1016/j.websem.2014.03.003]
[13]
Varsou, D-D.; Nikolakopoulos, S.; Tsoumanis, A.; Melagraki, G.; Afantitis, A. Enalos+ KNIME Nodes: new cheminformatics tools for drug discovery. Methods Mol. Biol., 2018, 1824, 113-138.
[http://dx.doi.org/10.1007/978-1-4939-8630-9_7] [PMID: 30039404]
[14]
Wolber, G.; Langer, T. LigandScout: 3-D pharmacophores derived from protein-bound ligands and their use as virtual screening filters. J. Chem. Inf. Model., 2005, 45(1), 160-169.
[http://dx.doi.org/10.1021/ci049885e] [PMID: 15667141]
[15]
Gilson, M.K.; Liu, T.; Baitaluk, M.; Nicola, G.; Hwang, L.; Chong, J. BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic Acids Res., 2016, 44(D1), D1045-D1053.
[http://dx.doi.org/10.1093/nar/gkv1072] [PMID: 26481362]
[16]
Howe, E.A.; de Souza, A.; Lahr, D.L.; Chatwin, S.; Montgomery, P.; Alexander, B.R.; Nguyen, D-T.; Cruz, Y.; Stonich, D.A.; Walzer, G.; Rose, J.T.; Picard, S.C.; Liu, Z.; Rose, J.N.; Xiang, X.; Asiedu, J.; Durkin, D.; Levine, J.; Yang, J.J.; Schürer, S.C.; Braisted, J.C.; Southall, N.; Southern, M.R.; Chung, T.D.Y.; Brudz, S.; Tanega, C.; Schreiber, S.L.; Bittker, J.A.; Guha, R.; Clemons, P.A. BioAssay Research Database (BARD): chemical biology and probe-development enabled by structured metadata and result types. Nucleic Acids Res., 2015, 43(Database issue), D1163-D1170.
[http://dx.doi.org/10.1093/nar/gku1244] [PMID: 25477388]
[17]
Gaulton, A.; Bellis, L.J.; Bento, A.P.; Chambers, J.; Davies, M.; Hersey, A.; Light, Y.; McGlinchey, S.; Michalovich, D.; Al-Lazikani, B.; Overington, J.P. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic Acids Res., 2012, 40(Database issue), D1100-D1107.
[http://dx.doi.org/10.1093/nar/gkr777] [PMID: 21948594]
[18]
Bento, A.P.; Gaulton, A.; Hersey, A.; Bellis, L.J.; Chambers, J.; Davies, M.; Krüger, F.A.; Light, Y.; Mak, L.; McGlinchey, S.; Nowotka, M.; Papadatos, G.; Santos, R.; Overington, J.P. The ChEMBL bioactivity database: an update. Nucleic Acids Res., 2014, 42(Database issue), D1083-D1090.
[http://dx.doi.org/10.1093/nar/gkt1031] [PMID: 24214965]
[19]
Gaulton, A.; Hersey, A.; Nowotka, M.; Bento, A.P.; Chambers, J.; Mendez, D.; Mutowo, P.; Atkinson, F.; Bellis, L.J.; Cibrián-Uhalte, E.; Davies, M.; Dedman, N.; Karlsson, A.; Magariños, M.P.; Overington, J.P.; Papadatos, G.; Smit, I.; Leach, A.R. The ChEMBL database in 2017. Nucleic Acids Res., 2017, 45(D1), D945-D954.
[http://dx.doi.org/10.1093/nar/gkw1074] [PMID: 27899562]
[20]
Davies, M.; Nowotka, M.; Papadatos, G.; Dedman, N.; Gaulton, A.; Atkinson, F.; Bellis, L.; Overington, J.P. ChEMBL web services: streamlining access to drug discovery data and utilities. Nucleic Acids Res., 2015, 43(W1)W612-20
[http://dx.doi.org/10.1093/nar/gkv352] [PMID: 25883136]
[21]
Nowotka, M.M.; Gaulton, A.; Mendez, D.; Bento, A.P.; Hersey, A.; Leach, A. Using ChEMBL web services for building applications and data processing workflows relevant to drug discovery. Expert Opin. Drug Discov., 2017, 12(8), 757-767.
[PMID: 28602100 ]
[22]
Williams, A. ChemSpider and its demanding web: building a structure-centric community for chemists. Chem. Int., 2008, •••, 30.
[23]
Pence, H.E.; Williams, A. ChemSpider: an online chemical information resource. J. Chem. Educ., 2010, 87(11), 1123-1124.
[http://dx.doi.org/10.1021/ed100697w]
[24]
Wishart, D.S.; Knox, C.; Guo, A.C.; Shrivastava, S.; Hassanali, M.; Stothard, P.; Chang, Z.; Woolsey, J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res., 2006, 34(Database issue), D668-D672.
[http://dx.doi.org/10.1093/nar/gkj067] [PMID: 16381955]
[25]
Wishart, D.S.; Feunang, Y.D.; Guo, A.C.; Lo, E.J.; Marcu, A.; Grant, J.R.; Sajed, T.; Johnson, D.; Li, C.; Sayeeda, Z.; Assempour, N.; Iynkkaran, I.; Liu, Y.; Maciejewski, A.; Gale, N.; Wilson, A.; Chin, L.; Cummings, R.; Le, D.; Pon, A.; Knox, C.; Wilson, M. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res., 2018, 46(D1), D1074-D1082.
[http://dx.doi.org/10.1093/nar/gkx1037] [PMID: 29126136]
[26]
Berman, H.M.; Westbrook, J.; Feng, Z.; Gilliland, G.; Bhat, T.N.; Weissig, H.; Shindyalov, I.N.; Bourne, P.E. The protein data bank. Nucleic Acids Res., 2000, 28(1), 235-242.
[http://dx.doi.org/10.1093/nar/28.1.235] [PMID: 10592235]
[27]
Release, S. 2019-2: Schrödinger KNIME Extensions; Schrödinger, LLC: New York, NY, 2019.
[28]
Kim, S.; Thiessen, P.A.; Bolton, E.E.; Chen, J.; Fu, G.; Gindulyte, A.; Han, L.; He, J.; He, S.; Shoemaker, B.A.; Wang, J.; Yu, B.; Zhang, J.; Bryant, S.H. PubChem substance and compound databases. Nucleic Acids Res., 2016, 44(D1), D1202-D1213.
[http://dx.doi.org/10.1093/nar/gkv951] [PMID: 26400175]
[29]
Kim, S.; Chen, J.; Cheng, T.; Gindulyte, A.; He, J.; He, S.; Li, Q.; Shoemaker, B.A.; Thiessen, P.A.; Yu, B.; Zaslavsky, L.; Zhang, J.; Bolton, E.E. PubChem 2019 update: improved access to chemical data. Nucleic Acids Res., 2019, 47(D1), D1102-D1109.
[http://dx.doi.org/10.1093/nar/gky1033] [PMID: 30371825]
[30]
Sterling, T.; Irwin, J.J. ZINC 15--ligand discovery for everyone. J. Chem. Inf. Model., 2015, 55(11), 2324-2337.
[http://dx.doi.org/10.1021/acs.jcim.5b00559] [PMID: 26479676]
[31]
Kim, S.; Thiessen, P.A.; Bolton, E.E.; Bryant, S.H. PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic Acids Res., 2015, 43(W1)W605-11
[http://dx.doi.org/10.1093/nar/gkv396] [PMID: 25934803]
[32]
Fourches, D.; Muratov, E.; Tropsha, A. Trust, but verify: on the importance of chemical structure curation in cheminformatics and QSAR modeling research. J. Chem. Inf. Model., 2010, 50(7), 1189-1204.
[http://dx.doi.org/10.1021/ci100176x] [PMID: 20572635]
[33]
Kotsampasakou, E.; Montanari, F.; Ecker, G.F. Predicting drug-induced liver injury: The importance of data curation. Toxicology, 2017, 389, 139-145.
[http://dx.doi.org/10.1016/j.tox.2017.06.003] [PMID: 28652195]
[34]
Fourches, D.; Muratov, E.; Tropsha, A. Trust, but Verify II: a practical guide to chemogenomics data curation. J. Chem. Inf. Model., 2016, 56(7), 1243-1252.
[http://dx.doi.org/10.1021/acs.jcim.6b00129] [PMID: 27280890]
[35]
Papadatos, G.; Gaulton, A.; Hersey, A.; Overington, J.P. Activity, assay and target data curation and quality in the ChEMBL database. J. Comput. Aided Mol. Des., 2015, 29(9), 885-896.
[http://dx.doi.org/10.1007/s10822-015-9860-5] [PMID: 26201396]
[36]
Tang, J.; Tanoli, Z-U-R.; Ravikumar, B.; Alam, Z.; Rebane, A.; Vähä-Koskela, M.; Peddinti, G.; van Adrichem, A.J.; Wakkinen, J.; Jaiswal, A.; Karjalainen, E.; Gautam, P.; He, L.; Parri, E.; Khan, S.; Gupta, A.; Ali, M.; Yetukuri, L.; Gustavsson, A-L.; Seashore-Ludlow, B.; Hersey, A.; Leach, A.R.; Overington, J.P.; Repasky, G.; Wennerberg, K.; Aittokallio, T. Drug target commons: a community effort to build a consensus knowledge base for drug-target interactions. Cell Chem. Biol., 2018, 25(2), 224-229.e2.
[http://dx.doi.org/10.1016/j.chembiol.2017.11.009] [PMID: 29276046]
[37]
Zdrazil, B.; Pinto, M.; Vasanthanathan, P.; Williams, A.J.; Balderud, L.Z.; Engkvist, O.; Chichester, C.; Hersey, A.; Overington, J.P.; Ecker, G.F. Annotating human p-glycoprotein bioassay data. Mol. Inform., 2012, 31(8), 599-609.
[http://dx.doi.org/10.1002/minf.201200059] [PMID: 23293680]
[38]
Hersey, A.; Chambers, J.; Bellis, L.; Patrícia Bento, A.; Gaulton, A.; Overington, J.P. Chemical databases: curation or integration by user-defined equivalence? Drug Discov. Today. Technol., 2015, 14, 17-24.
[http://dx.doi.org/10.1016/j.ddtec.2015.01.005] [PMID: 26194583]
[39]
Gally, J-M.; Bourg, S.; Do, Q-T.; Aci-Sèche, S.; Bonnet, P. VSPrep: a general KNIME workflow for the preparation of molecules for virtual screening. Mol. Inform., 2017, 36(10), 36.
[http://dx.doi.org/10.1002/minf.201700023] [PMID: 28586180]
[41]
Hähnke, V.D.; Kim, S.; Bolton, E.E. PubChem chemical structure standardization. J. Cheminform., 2018, 10(1), 36.
[http://dx.doi.org/10.1186/s13321-018-0293-8] [PMID: 30097821]
[42]
Digles, D.; Caracoti, A.; Jacoby, E. Accessing the open PHACTS discovery platform with workflow tools in: Phenotypic Screening; Wagner, B., Ed.; , 2018, pp. 183-193.
[http://dx.doi.org/10.1007/978-1-4939-7847-2_14]
[43]
Van Rossum, G.; Drake, F.L., Jr Python Reference Manual; Centrum voor Wiskunde en Informatica Amsterdam, 1995.
[44]
Landrum, G. RDKit: Open-Source Cheminformatics; Available at: http://www.rdkit.org (Accessed Date: 17.02.2020)
[45]
Chambers, J.; Davies, M.; Gaulton, A.; Hersey, A.; Velankar, S.; Petryszak, R.; Hastings, J.; Bellis, L.; McGlinchey, S.; Overington, J.P. UniChem: a unified chemical structure cross-referencing and identifier tracking system. J. Cheminform., 2013, 5(1), 3.
[http://dx.doi.org/10.1186/1758-2946-5-3] [PMID: 23317286]
[46]
Gray, A.; Groth, P.; Loizou, A.; Askjaer, S.; Brenninkmeijer, C.; Burger, K.; Chichester, C.; Evelo, C.; Goble, C.; Harland, L.; Pettifer, S.; Thompson, M.; Waagmeester, A.; Williams, A. Applying linked data approaches to pharmacology: architectural decisions and implementation. Semant. Web, 2014, 33, 101-113.
[http://dx.doi.org/10.3233/SW-2012-0088]
[47]
Karapetyan, K.; Batchelor, C.; Sharpe, D.; Tkachenko, V.; Williams, A.J. The Chemical Validation and Standardization Platform (CVSP): large-scale automated validation of chemical structure datasets. J. Cheminform., 2015, 7, 30.
[http://dx.doi.org/10.1186/s13321-015-0072-8] [PMID: 26155308]
[48]
Batchelor, C.; Brenninkmeijer, C.Y.A.; Chichester, C.; Davies, M.; Digles, D.; Dunlop, I.; Evelo, C.T.; Gaulton, A.; Goble, C.; Gray, A.J.G.; Groth, P.; Harland, L.; Karapetyan, K.; Loizou, A.; Overington, J.P.; Pettifer, S.; Steele, J.; Stevens, R.; Tkachenko, V.; Waagmeester, A.; Williams, A.; Willighagen, E.L. Scientific lenses to support multiple views over linked chemistry data in: The Semantic Web - ISWC 2014; Springer International Publishing, 2014, pp. 98-113.
[http://dx.doi.org/10.1007/978-3-319-11964-9_7]
[49]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J-W.; da Silva Santos, L.B.; Bourne, P.E.; Bouwman, J.; Brookes, A.J.; Clark, T.; Crosas, M.; Dillo, I.; Dumon, O.; Edmunds, S.; Evelo, C.T.; Finkers, R.; Gonzalez-Beltran, A.; Gray, A.J.G.; Groth, P.; Goble, C.; Grethe, J.S.; Heringa, J.; ’t Hoen, P.A.C.; Hooft, R.; Kuhn, T.; Kok, R.; Kok, J.; Lusher, S.J.; Martone, M.E.; Mons, A.; Packer, A.L.; Persson, B.; Rocca-Serra, P.; Roos, M.; van Schaik, R.; Sansone, S-A.; Schultes, E.; Sengstag, T.; Slater, T.; Strawn, G.; Swertz, M.A.; Thompson, M.; van der Lei, J.; van Mulligen, E.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data, 2016, 3160018
[http://dx.doi.org/10.1038/sdata.2016.18] [PMID: 26978244]