What is it about?

Companies are increasingly expected to, and in some cases legally required to, report on their environmental, social, and governance (ESG) performance. The voluntary production of corporate social responsibility (CSR) reports is now a commonplace among large companies. WikiRate is a platform for collecting and analyzing information about companies’ ESG impacts in a transparent manner, with the aim of making that information accessible to all and using it to push for improved ESG performance on the part of companies. There are many existing publicly accessible and semistructured sources of company-relevant data scattered across the Web sites of different organizations. Without any technical support, the collection and integration of these data into the WikiRate platform would be a challenging and time-consuming task. In this paper, we introduce easIE, an easy-to-use information extraction framework for extracting data from external sources related to companies. Users with little or no programming skills can contribute to the process of data gathering more actively by extracting data from both static and dynamic HTML pages simply by defining a configuration file.

Featured Image

Why is it important?

A vast amount of company-relevant data are scattered across the Web sites of different organizations. The collection of such data could be a challenging and time-consuming task, as it requires one to build custom information extraction (IE) logic for each Web source. We try to address the problem of gathering corporate ESG performance data from diverse Web sources and integrating it into an open CSR database. To this end, easIE framework is proposed that enables users with limited programming skills to extract information of interest from selected Web sources by creating appropriate configuration files using simple sets of extraction rules.

Perspectives

I hope this article to give a better understanding how easIE framework is working and to enable users with limited programming skills to create their own large CSR databases. To the best of our knowledge, easIE is the first framework that addresses the problem of extracting CSR data from Web pages in a well-structured format.

Mrs Vasiliki Gkatziaki
Centre for Research and Technology-Hellas

Read the Original

This page is a summary of: easIE, ACM Transactions on Internet Technology, November 2018, ACM (Association for Computing Machinery),
DOI: 10.1145/3155807.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page