Abstract
Background: The KNIME platform offers several tools for the analysis of chem- and pharmacoinformatics
data. Unless one has sufficient in-house data available for the analysis of interest, it is
necessary to fetch third party data into KNIME. Many data sources offer valuable data, but including
this data in a workflow is not always straightforward.
Objective: Here we discuss different ways of accessing public data sources. We give an overview of
KNIME nodes for different sources, with references to available example workflows. For data sources
with no individual KNIME node available, we present a general approach of accessing a web interface
via KNIME.
In addition, we discuss necessary steps before the data can be analysed, such as data curation, chemical
standardisation and the merging of datasets.
Keywords:
KNIME, database, data mining, web service, data curation, chemical standardization, REST, API.
[8]
Bray, T.; Maler, E.; Yergeau, F.; Sperberg-McQueen, M.; Paoli, J. Extensible Markup Language (XML) 1.0 (Fifth Edition); W3C, 2008.
[9]
Bray, T. The JavaScript Object Notation (JSON) Data interchange format; RFC Editor/ RFC Editor,
2017.
[22]
Williams, A. ChemSpider and its demanding web: building a structure-centric community for chemists. Chem. Int., 2008, •••, 30.
[27]
Release, S. 2019-2: Schrödinger KNIME Extensions; Schrödinger, LLC: New York, NY, 2019.
[43]
Van Rossum, G.; Drake, F.L., Jr Python Reference Manual; Centrum voor Wiskunde en Informatica Amsterdam, 1995.
[49]
Wilkinson, M.D.; Dumontier, M.; Aalbersberg, I.J.J.; Appleton, G.; Axton, M.; Baak, A.; Blomberg, N.; Boiten, J-W.; da Silva Santos, L.B.; Bourne, P.E.; Bouwman, J.; Brookes, A.J.; Clark, T.; Crosas, M.; Dillo, I.; Dumon, O.; Edmunds, S.; Evelo, C.T.; Finkers, R.; Gonzalez-Beltran, A.; Gray, A.J.G.; Groth, P.; Goble, C.; Grethe, J.S.; Heringa, J.; ’t Hoen, P.A.C.; Hooft, R.; Kuhn, T.; Kok, R.; Kok, J.; Lusher, S.J.; Martone, M.E.; Mons, A.; Packer, A.L.; Persson, B.; Rocca-Serra, P.; Roos, M.; van Schaik, R.; Sansone, S-A.; Schultes, E.; Sengstag, T.; Slater, T.; Strawn, G.; Swertz, M.A.; Thompson, M.; van der Lei, J.; van Mulligen, E.; Velterop, J.; Waagmeester, A.; Wittenburg, P.; Wolstencroft, K.; Zhao, J.; Mons, B. The FAIR Guiding Principles for scientific data management and stewardship.
Sci. Data, 2016,
3160018
[
http://dx.doi.org/10.1038/sdata.2016.18] [PMID:
26978244]