What is it about?
In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm.
Featured Image
Photo by Matteo Catanese on Unsplash
Why is it important?
• We propose a new KG retrieval paradigm for VQA together with a late knowledge injection strategy, which works without relying on annotated ground-truth knowledge. • We improve and build a large-scale common-sense KG targeted at knowledge-based VQA, proving that high quality KG benefits VQA performance. • Our method obtains state-of-the-art results on the OKVQA dataset, and verifies that using a high-quality KG as the external knowledge is better than using unstructured text and pure language model parameters. Our code is available at https://github.com/hackerchenzhuo/LaKo.
Perspectives
This work is very close to the recently popular Chain-of-Thought concept, which has great potential.
Zhuo Chen
Zhejiang University
Read the Original
This page is a summary of: LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection, October 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3579051.3579053.
You can read the full text:
Contributors
The following have contributed to this page







