combining datasets for TB machine learning models
What is it about?
We compared Support vector machine, recursive partitioning and bayesian models for 4 different TB datasets using 5 fold cross validation. We used one of the datasets to test the bayesian datasets for 3 of the other datasets. We also combined 3 datasets to build models and then tested with he other dataset
Why is it important?
There were differences between single dataset Bayesian models predicting the test set. There were differences between machine learning methods when using 5 fold cross validation., SVM seemed to perform best with the 3 combined datasets when predicting the test set. Combined model Bayesian does best at predicting actives in the GSK 177 compound set. Data fusion may be useful for TB drug discovery to cover more chemical property space.
The following have contributed to this page: Dr Sean Ekins