What is it about?

Image captioning means generate descriptive sentences from a image. We illustrate relevant representative methods and discusses their advantages and limitations. The ultimate goal of this work is to serve as a tool for understanding the existing literature and highlighting future directions in the area of image captioning for Computer Vision and Natural Language Processing communities may benefit from.

Featured Image

Why is it important?

Intending to give a testament to the journey that captioning has taken so far and to encourage novel ideas, in this paper, we provide a holistic overview of the models developed in the last years. Another contribution of this study is to quantitatively compare the main image captioning methods considering standard metrics, and discuss the strengths and weaknesses of various techniques, thereby clarifying the performance, differences and characteristics of the most critical models. Finally, we outlined the recent research trends of image captioning and discussed some open challenges and future directions.


I hope this article makes what people might think is a boring, slightly abstract area like artificial intelligence and machine learning, kind of interesting and maybe even exciting. Because artificial intelligence is silently changing my way of life, making our life better and better. More than anything else, and if nothing else, I hope you find this article thought-provoking.

Gaifang Luo
Shanxi Agricultural University

Read the Original

This page is a summary of: A thorough review of models, evaluation metrics, and datasets on image captioning, IET Image Processing, November 2021, the Institution of Engineering and Technology (the IET), DOI: 10.1049/ipr2.12367.
You can read the full text:



The following have contributed to this page