Drug discovery is a challenging and expensive field. Hence, novel in silico tools have been developed in early discovery stage to identify and prioritize novel molecules with suitable physicochemical properties. In many in silico drug design projects, molecular databases are screened by virtual screening tools to search for potential bioactive molecules. The preparation of the molecules is therefore a key step in the success of well-established techniques such as docking, similarity or pharmacophore searching. We review here the lists of several toolkits used in different steps during the cleaning of molecular databases, integrated within a KNIME workflow. During the first step of the automatic workflow, salts are removed, and mixtures are split to get one compound per entry. Then compounds with unwanted features are filtered. Duplicated entries are then deleted while considering stereochemistry. As a compromise between exhaustiveness and computational time, most distributed tautomers at physiological pH are computed. Additionally, various flags are applied to molecules by using either classical molecular descriptors, similarity search to known libraries or substructure search rules. Moreover, stereoisomers are enumerated depending on the unassigned chiral centers. Then, three-dimensional coordinates, and optionally conformers, are generated. This workflow has been already applied to several drug design projects and can be used for molecular database preparation upon request.
Keywords: VSPrep, chemoinformatics, molecular databases, preparation, workflow, virtual screening, KNIME.