Does automated curation and data standardization contribute to improved QSAR Models?
What is it about?
We used a manual approach to curate structure based data for a publicly available physicochemical property dataset. Using this experience we developed an automation procedure using KNIME to process multiple other datasets and then developed QSAR prediction models and examined the influence of data curation on the statistical performance of the models.
Why is it important?
Data quality is important. For the development of QSAR prediction models this paper shows the importance of data curation and how it influences the resulting statistical performance of the models and why it is worth the upfront investment in checking and validating the data. This work focused only on the chemical structures, NOT the actual property values, and even this made a measurable difference to the algorithmic performance.
The following have contributed to this page: Dr Antony John Williams
In partnership with: