Small molecule bioactivity databases
What is it about?
Over the past decade there has been a growth in the number of public chemistry and bioactivity databases. Some of these have tens of millions of molecules and millions of bioactivity data. These sources are immensely valuable for data mining and machine learning modeling. This chapter covers databases like BindingDB, PubChem, ChEMBL, GtoPdb, CDD Vault etc. We discuss issues of data quality and how the data may be used.
Why is it important?
This chapter is important because these massive datasets are used to build models that can be useful in decision making. However these models are only sporadically tested or evaluated with external datasets so its unclear as to the utility of them. We also propose areas that could be improved such as data curation and correction of errors.
The following have contributed to this page: Dr Sean Ekins and Dr Antony John Williams