Cross-Modal Multiple Granularity Interactive Fusion Network for Long Document Classification

Tengfei Liu; Yongli Hu; Junbin Gao; Yanfeng Sun; Baocai Yin

doi:10.1145/3631711

What is it about?

In the context of long document classification (LDC), effectively utilizing multi-modal information encompassing texts and images within these documents has not received adequate attention. To address these challenges, we propose a novel cross-modal method for long document classification, in which multiple granularity feature shifting networks are proposed to integrate the multi-scale text and visual features of long documents adaptively.

Photo by Nathan Dumlao on Unsplash

Why is it important?

A novel Cross-Modal Multiple Granularity Interactive Fusion Network (CM-MGIFN) is proposed for LDC by combining the text and image features at different levels of granularity. To the best of our knowledge, this is the first work integrating text and images at different granularity levels for LDC. A Multi-Modal Collaborative Pooling (MMCP) block is proposed to eliminate the redundant information of text, thus reducing the computational complexity. Extensive experiments on the public Food101 dataset and two newly created multi-modal long document datasets show that our method outperforms the single-modal text methods and defeats the state-of-the-art multi-modal baselines.

Perspectives

The manner of using visual features is simple in the model. In the future, we will exploit the relations of images and more fine features of images to improve the performance of our method for LDC.
Tengfei Liu
Beijing University of Technology

This page is a summary of: Cross-Modal Multiple Granularity Interactive Fusion Network for Long Document Classification, ACM Transactions on Knowledge Discovery from Data, November 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3631711.
You can read the full text:

Read

Contributors

The following have contributed to this page

Tengfei Liu
Beijing University of Technology

Cross-Modal Multiple Granularity Interactive Fusion Network for Long Document Classification

What is it about?

Why is it important?

Perspectives

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Cross-Modal Multiple Granularity Interactive Fusion Network for Long Document Classification

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

You might also like

Dissecting Optional Micro-Decisions in Online Transactions: Perceptions, Deceptions, and Errors

SiG: A Siamese-based Graph Convolutional Network to Align Knowledge in Autonomous Transportation Systems

ProActive DeepFake Detection using GAN-based Visible Watermarking

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management