The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS

Igor V. Tetko; Daniel M. Lowe; Antony J. Williams

doi:10.1186/s13321-016-0113-y

What is it about?

Text-mining was used for automated extraction of melting point data from published PATENTS. Almost 300,000 data points were collected and used to develop models to predict melting and pyrolysis (decomposition). The models are available for everyone to use!

Why is it important?

This paper indicates that it is now possible to text-mine property data directly out of a large corpus and, following automated curation/validation the data can then be used as the basis of building models. This work was focused on Melting Point data but could be extended to other properties such as logP, NMR data etc.

Perspectives

The manual extraction of data from literature, or in this case patents, is very time-consuming. The possibility of using text mining for the extraction of data has been of interest to me personally for years and this collaboration with Daniel Lowe from NextMove to apply their software for extraction, and with Igor Tetko to perform the modeling, proves the point I believe. MP is only one property but this approach could now be extended to other properties.
Dr Antony John Williams
United States Environmental Protection Agency

This page is a summary of: The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS, Journal of Cheminformatics, January 2016, Springer Science + Business Media,
DOI: 10.1186/s13321-016-0113-y.
You can read the full text:

Read

Resources

Data
Datasets available on Figshare
The data associated with this publication are available as Open Data on FigShare.

Contributors

The following have contributed to this page

Dr Antony John Williams
United States Environmental Protection Agency

Extracting and Modeling a Large Melting Point Dataset (300k) from a Patent Collection

What is it about?

Why is it important?

Perspectives

Resources

Datasets available on Figshare

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Extracting and Modeling a Large Melting Point Dataset (300k) from a Patent Collection

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Datasets available on Figshare

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management