Remote sensing visual question answering with a self-attention multi-modal encoder

João Daniel Silva; João Magalhães; Devis Tuia; Bruno Martins

doi:10.1145/3557918.3565874

What is it about?

Given an aerial image and a question, the model obtains an answer that is relevant regarding the image contents. We show that the Transformer architecture which is used for general domain Visual Question Answering with high results also works well for the domain of Remote Sensing. Image and question features are concatenated and processed by the Transformer attention layers.

Photo by NASA on Unsplash

Why is it important?

We show that Transformer based systems are better for the task of Remote Sensing Visual Question Answering than current baselines composed of Convolutional Neural Networks and Recurrent Neural Networks.

Perspectives

Remote Sensing Visual Question Answering is an interesting task for users to interacted with Earth Observation data. Users can ask about specfic information about images and obtain it. I hope this article contributes to an higher interest of other researchers to develop systems for this task.
João Daniel Silva
Instituto Superior Técnico

This page is a summary of: Remote sensing visual question answering with a self-attention multi-modal encoder, November 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3557918.3565874.
You can read the full text:

Read

Resources

Image
Graphical Representation of the Model
Multimodal Transformer for Remote Sensing Visual Question Answering

Contributors

The following have contributed to this page

Remote Sensing Visual Question Answering with Vision and Language Transformers

What is it about?

Why is it important?

Perspectives

Resources

Graphical Representation of the Model

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Remote Sensing Visual Question Answering with Vision and Language Transformers

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Graphical Representation of the Model

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management