What is it about?

Text-mining was used for automated extraction of melting point data from published PATENTS. Almost 300,000 data points were collected and used to develop models to predict melting and pyrolysis (decomposition). The models are available for everyone to use!

Featured Image

Why is it important?

This paper indicates that it is now possible to text-mine property data directly out of a large corpus and, following automated curation/validation the data can then be used as the basis of building models. This work was focused on Melting Point data but could be extended to other properties such as logP, NMR data etc.

Perspectives

The manual extraction of data from literature, or in this case patents, is very time-consuming. The possibility of using text mining for the extraction of data has been of interest to me personally for years and this collaboration with Daniel Lowe from NextMove to apply their software for extraction, and with Igor Tetko to perform the modeling, proves the point I believe. MP is only one property but this approach could now be extended to other properties.

Dr Antony John Williams
United States Environmental Protection Agency

Read the Original

This page is a summary of: The development of models to predict melting and pyrolysis point data associated with several hundred thousand compounds mined from PATENTS, Journal of Cheminformatics, January 2016, Springer Science + Business Media,
DOI: 10.1186/s13321-016-0113-y.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page