What is it about?

We use open source fingerprints and a Bayesian algorithm to build thousands of computational models from data in a very big public dataset called ChEMBL. We demonstrate the cross validation of these models, make them openly accessible and demonstrate how they can be imported in to a mobile app and used for predictions.

Featured Image

Why is it important?

We are not aware of anyone using ChEMBL in this way with open source technologies and making the thousands of models accessible. In addition we describe a novel algorithm for detecting thresholds for active / inactive in continuous data. Finally we access the effect of folding on the fingerprints.

Perspectives

The paper follows up on the previous description of open source Bayesian models, adding some more detail about validation and calibration techniques. It describes a method for partitioning the ChEMBL database of bioactivity data into >2000 datasets, and an algorithm for automatically detecting a threshold for classifying as active/inactive, which is required for Bayesian algorithms. Each of the datasets was used for model building, in order to evaluate the technique. The results are made available, as well as a description of the method.

Alex Michael Clark
Molecular Materials Informatics

Read the Original

This page is a summary of: Open Source Bayesian Models. 2. Mining a “Big Dataset” To Create and Validate Models with ChEMBL, Journal of Chemical Information and Computer Sciences, June 2015, American Chemical Society (ACS),
DOI: 10.1021/acs.jcim.5b00144.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page