论文信息 - Active Learning in the Drug Discovery Process

Active Learning in the Drug Discovery Process

We investigate the following data mining problem from Computational Chemistry: From a large data set of compounds, find those that bind to a target molecule in as few iterations of biological testing as possible. In each iteration a comparatively small batch of compounds is screened for binding to the target. We apply active learning techniques for selecting the successive batches. One selection strategy picks unlabeled examples closest to the maximum margin hyperplane. Another produces many weight vectors by running perceptrons over multiple permutations of the data. Each weight vector votes with its ± prediction and we pick the unlabeled examples for which the prediction is most evenly split between + and -. For a third selection strategy note that each unlabeled example bisects the version space of consistent weight vectors. We estimate the volume on both sides of the split by bouncing a billiard through the version space and select un-labeled examples that cause the most even split of the version space. We demonstrate that on two data sets provided by DuPont Pharmaceuticals that all three selection strategies perform comparably well and are much better than selecting random batches for testing.

[1] Bernhard E. Boser,et al. A training algorithm for optimal margin classifiers , 1992, COLT '92.

[2] David A. Cohn,et al. Training Connectionist Networks with Queries and Selective Sampling , 1989, NIPS.

[3] Bernhard Schölkopf,et al. Computing the Bayes Kernel Classifier , 2000 .

[4] Daphne Koller,et al. Support Vector Machine Active Learning with Applications to Text Classification , 2002, J. Mach. Learn. Res..

[5] Ralf Herbrich,et al. Bayes Point Machines: Estimating the Bayes Point in Kernel Space , 1999 .

[6] Nello Cristianini,et al. Query Learning with Large Margin Classifiers , 2000, ICML.

[7] H. Sebastian Seung,et al. Query by committee , 1992, COLT '92.

[8] Thorsten Joachims,et al. Making large scale SVM learning practical , 1998 .

[9] Pal Rujan,et al. Playing Billiards in Version Space , 1997, Neural Computation.

[10] Daphne Koller,et al. Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[11] Alkemade Pp,et al. Playing Billiard in Version Space , 1997 .

[12] Yoav Freund,et al. Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.