Are Bigger Data Sets Better for Machine Learning? Fusing Single-Point and Dual-Event Dose Response Data for <i>Mycobacterium tuberculosis</i>

Sean Ekins; Joel S. Freundlich; Robert C. Reynolds

doi:10.1021/ci500264r

What is it about?

After focusing on using dose response data for modeling we have added in the huge amounts of inactive single point data. The biggest models now have over 300,000 molecules in the training set. We show for TB there is little improvement by adding this data and speculate the smaller models may be adequate.

Why is it important?

Bigger models may not always be better at predicting external compounds. We evaluate this hypothesis with TB datasets we have collected. These models are a powerful resource for virtual screening.

This page is a summary of: Are Bigger Data Sets Better for Machine Learning? Fusing Single-Point and Dual-Event Dose Response Data for Mycobacterium tuberculosis, Journal of Chemical Information and Computer Sciences, July 2014, American Chemical Society (ACS),
DOI: 10.1021/ci500264r.
You can read the full text:

Read

Contributors

The following have contributed to this page

Dr Sean Ekins
Collaborations in Chemistry

Bigger datasets for TB machine learning

What is it about?

Why is it important?

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Bigger datasets for TB machine learning

What is it about?

Featured Image

Why is it important?

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management