Chapter 16. Small-molecule Bioactivity Databases

  • Sean Ekins, Alex M. Clark, Christopher Southan, Barry A. Bunin, Antony J. Williams
  • Royal Society of Chemistry
  • DOI: 10.1039/9781782626770-00344

Small molecule bioactivity databases

What is it about?

Over the past decade there has been a growth in the number of public chemistry and bioactivity databases. Some of these have tens of millions of molecules and millions of bioactivity data. These sources are immensely valuable for data mining and machine learning modeling. This chapter covers databases like BindingDB, PubChem, ChEMBL, GtoPdb, CDD Vault etc. We discuss issues of data quality and how the data may be used.

Why is it important?

This chapter is important because these massive datasets are used to build models that can be useful in decision making. However these models are only sporadically tested or evaluated with external datasets so its unclear as to the utility of them. We also propose areas that could be improved such as data curation and correction of errors.


Dr Sean Ekins
Collaborations in Chemistry

Each author brought there own perspective as developers of databases, curators, software developers etc. We also provide an update of the Bayesian models created with ChEMBL.

Read Publication

The following have contributed to this page: Dr Sean Ekins and Dr Antony John Williams