What is it about?
Image captioning means generate descriptive sentences from a image. We illustrate relevant representative methods and discusses their advantages and limitations. The ultimate goal of this work is to serve as a tool for understanding the existing literature and highlighting future directions in the area of image captioning for Computer Vision and Natural Language Processing communities may benefit from.
Photo by Ion Fet on Unsplash
Why is it important?
Intending to give a testament to the journey that captioning has taken so far and to encourage novel ideas, in this paper, we provide a holistic overview of the models developed in the last years. Another contribution of this study is to quantitatively compare the main image captioning methods considering standard metrics, and discuss the strengths and weaknesses of various techniques, thereby clarifying the performance, differences and characteristics of the most critical models. Finally, we outlined the recent research trends of image captioning and discussed some open challenges and future directions.
Read the Original
This page is a summary of: A thorough review of models, evaluation metrics, and datasets on image captioning, IET Image Processing, November 2021, the Institution of Engineering and Technology (the IET), DOI: 10.1049/ipr2.12367.
You can read the full text:
The following have contributed to this page