Affordable Access

deepdyve-link
Publisher Website

Bayesian similarity searching in high-dimensional descriptor spaces combined with Kullback-Leibler descriptor divergence analysis.

Authors
  • Vogt, Martin
  • Bajorath, Jürgen
Type
Published Article
Journal
Journal of chemical information and modeling
Publication Date
Feb 01, 2008
Volume
48
Issue
2
Pages
247–255
Identifiers
DOI: 10.1021/ci700333t
PMID: 18229907
Source
Medline
License
Unknown

Abstract

We investigate an approach that combines Bayesian modeling of probability distributions of descriptor values of active and database molecules with Kullback-Leibler analysis of the divergence between these distributions. The methodology is used for Bayesian screening and also to predict compound recall rates. In our study, we analyze two fundamental approximations underlying the Bayesian screening approach: the assumption that descriptors are independent of each other and, furthermore, that their data set values follow normal distributions. In addition, we calculate Kullback-Leibler divergence for single descriptors, rather than multiple-feature distributions, in order to prioritize descriptors for screening calculations. The results show that descriptor correlation effects, violating the assumption of feature independence, can lead to notable reduction of compound recall in Bayesian screening. Controlling descriptor correlation effects play a much more significant role for achieving high recall rates than approximating descriptor distributions by Gaussians. Furthermore, Kullback-Leibler divergence analysis is shown to systematically identify descriptors that are the most relevant for the outcome of Bayesian screening calculations.

Report this publication

Statistics

Seen <100 times