What is it about?

The elaboration of systematic reviews has become a common practice in computer science after being exclusively related to healthcare and medical sciences. The process incorporates several steps to collect and analyze relevant papers to answer a set of well-formulated research questions. The search process starts by exploring different sources and digital libraries. This often results in a huge number of documents. After deduplication, the metadata of all the retrieved documents are checked for relevance before being approved for inclusion in the review. This task is known to be long and tiresome. In this paper, we propose a semi-automatic system that helps in reducing the efforts required for screening papers. The proposed system combines unsupervised and semi-supervised machine learning models and makes use of the domain ontology. Several features are extracted from metadata and used for classification. With the adoption of semi-supervised learning, researchers are only asked to manually label a subset of retrieved papers. Those papers are used to train a semi-supervised model which can then automatically classify the remaining papers. The proposed system is experimented with seven datasets built from pre-elaborated systematic reviews in computer science. We found that the system can save 50% of the efforts reaching up to 89% in terms of macro F1-score and up to 97% in terms of accuracy.

Featured Image

Read the Original

This page is a summary of: A Semi-automatic Document Screening System for Computer Science Systematic Reviews, January 2022, Springer Science + Business Media,
DOI: 10.1007/978-3-031-04112-9_15.
You can read the full text:

Read

Contributors

The following have contributed to this page