In Silico Prediction of Physicochemical Properties of Environmental Chemicals Using Molecular Fingerprints and Machine Learning

Qingda Zang, Kamel Mansouri, Antony John Williams, Richard S. Judson, David G. Allen, Warren M. Casey, Nicole C. Kleinstreuer
  • Journal of Chemical Information and Computer Sciences, December 2016, American Chemical Society (ACS)
  • DOI: 10.1021/acs.jcim.6b00625

Machine Learning Approaches to Predict PhysChem Properties of Environmental Chemicals

What is it about?

Physicochemical properties are needed to model environmental fate and transport, as well as exposure potential. The purpose of the present study was to generate an open-source Quantitative Structure-Property Relationship (QSPR) workflow to predict a variety of physicochemical properties that would have cross-platform compatibility to integrate into existing cheminformatics workflows. In this effort, decades-old experimental property data sets available within EPA were reanalyzed using modern cheminformatics workflows to build updated QSPR models capable of supplying computationally efficient, open, and transparent HTS property predictions in support of environmental modeling efforts. Models were built for the prediction of six physicochemical properties: octanol-water partition coefficient (log P), water solubility (log S), boiling point (BP), melting point (MP), vapor pressure (log VP) and bioconcentration factor (log BCF). The newly derived models can be employed for rapid estimation of physicochemical properties within an open-source HTS workflow to inform fate and toxicity prediction models of environmental chemicals.

Why is it important?

This paper is an example of using modern machine learning methods for the purpose of modeling a series of PhysChem properties.


Dr Antony John Williams (Author)
United States Environmental Protection Agency

In an earlier publication we curated a number of physicochemical property datasets and made them available as Open Data. This allowed other scientists to use the data for their own approaches in terms of modeling the data. This publication is an alternative modeling approach to that taken in our own team and shows the benefit of making data openly in formats that are easy to consume.

Read Publication

The following have contributed to this page: Dr Antony John Williams, Dr Nicole Kleinstreuer, and Dr kamel mansouri