Each year large pharmaceutical companies produce massive amounts of primary screening data for lead discovery. To make better use of the vast amount of information in pharmaceutical databases, companies have begun to scrutinize the lead generation stage to ensure that more and better qualified lead series enter the downstream optimization and development stages. This article describes computational techniques for end to end analysis of large drug discovery screening sets. The analysis proceeds in three stages: In stage 1 the initial screening set is filtered to remove compounds that are unsuitable as lead compounds. In stage 2 local structural neighborhoods around active compound classes are identified, including similar but inactive compounds. In stage 3 the structure-activity relationships within local structural neighborhoods are analyzed. These processes are illustrated by analyzing two large, publicly available databases.
Keywords: Large Screening Sets, pharmaceutical databases