At the early stage of drug discovery, many thousands of chemical
compounds can be synthesized and tested (assayed) for potency (activity) with
high throughput screening (HTS). With ever-increasing numbers of compounds to
be tested (now often in the neighborhood of 500,000) it remains a challenge to
find strategies via sequential design that reduce costs while locating classes
of active compounds. Initial screening of a modest number of selected compounds
(first-stage) is used to construct a structure-activity relationship
(SAR). Based on this model, a second-stage sample is selected, the SAR
updated and, if no more sampling is done, the activities of not yet tested
compounds are predicted. Instead of stopping, the SAR could be used to
determine another stage of sampling after which the SAR is updated and the
process repeated.
¶ We use existing data on the potency and chemical structure of 70,223
compounds to investigate various sequential testing schemes. Evidence on two
assays supports the conclusion that a rather small number of samples selected
according to the proposed scheme can more than triple the rate at which active
compounds are identified and also produce SARs effective for identifying
chemical structure. A different set of 52,883 compounds is used to confirm our
findings.
¶ One surprising conclusion of the study is that the design of the
initial sample stage may be unimportant: random selection or systematic methods
based on chemical structures are equally effective.