What is it about?

In automated claim checking, we want to determine whether a statement is factually true or false. To achieve this, we build natural language processing systems which can automatically check whether a statement is supported or refuted by trustworthy facts in a textual knowledge base (for example Wikipedia). Most previous work optimized systems given both a dataset of statements and existing facts in a textual knowledge base. We take the opposite approach - taking the system as given, we explore the choice of the knowledge base.

Featured Image

Why is it important?

Automated claim checking is one approach to tackle mis- and disinformation in the digital age. To achieve this, we need reliable and robust systems. However, most systems only work in the domain they have been trained on, for example a system trained on Wikipedia usually performs quite poorly in checking scientific claims. As a remedy, there have been many systems proposed in many different domains. In this work, we investigate if we can take a system as given, but build knowledge bases which are sufficient to automatically check claims from other domains. We find that this is the case and that claim checking systems can be transferred to new domains if we have access to a knowledge base from that new domain. Second, we do not find a universally best knowledge base, and combining multiple knowledge bases does not tend to improve performance beyond using the closest-domain knowledge base.

Perspectives

Automated claim checking is usually approached as: we have a dataset and a knowledge base from a given domain, and then build systems optimized on this dataset / knowledge base pair. If the system should fail on a different domain, researchers usually build a new dataset and knowledge base from this domain and train a new system. However, this scales poorly. In this work, we take a different approach and investigate whether we can build knowledge bases which allow us to transfer existing systems to new domains. We think of this as data-centric claim checking where we consider all the data dependencies (datasets AND knowledge bases). In future work, we plan to explore data-centric claim checking in more depth: how far can we go by optimizing knowledge bases and retrieval of relevant facts from knowledge bases, instead of optimizing claim checking systems.

Dominik Stammbach
ETH Zurich

Read the Original

This page is a summary of: The Choice of Textual Knowledge Base in Automated Claim Checking, Journal of Data and Information Quality, January 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3561389.
You can read the full text:

Read

Contributors

The following have contributed to this page