Fusing Dual-Event Data Sets for Mycobacterium tuberculosis Machine Learning Models and Their Evaluation

  • Sean Ekins, Joel S. Freundlich, Robert C. Reynolds
  • Journal of Chemical Information and Computer Sciences, November 2013, American Chemical Society (ACS)
  • DOI: 10.1021/ci400480s

combining datasets for TB machine learning models

What is it about?

We compared Support vector machine, recursive partitioning and bayesian models for 4 different TB datasets using 5 fold cross validation. We used one of the datasets to test the bayesian datasets for 3 of the other datasets. We also combined 3 datasets to build models and then tested with he other dataset

Why is it important?

There were differences between single dataset Bayesian models predicting the test set. There were differences between machine learning methods when using 5 fold cross validation., SVM seemed to perform best with the 3 combined datasets when predicting the test set. Combined model Bayesian does best at predicting actives in the GSK 177 compound set. Data fusion may be useful for TB drug discovery to cover more chemical property space.

Read Publication

http://dx.doi.org/10.1021/ci400480s

The following have contributed to this page: Dr Sean Ekins