Bigger datasets for TB machine learning
What is it about?
After focusing on using dose response data for modeling we have added in the huge amounts of inactive single point data. The biggest models now have over 300,000 molecules in the training set. We show for TB there is little improvement by adding this data and speculate the smaller models may be adequate.
Why is it important?
Bigger models may not always be better at predicting external compounds. We evaluate this hypothesis with TB datasets we have collected. These models are a powerful resource for virtual screening.
The following have contributed to this page: Dr Sean Ekins