Hierarchical Multi-Attention Transfer for Knowledge Distillation

Jianping Gou; Liyuan Sun; Baosheng Yu; Shaohua Wan; Dacheng Tao

doi:10.1145/3568679

What is it about?

Knowledge distillation (KD) is a powerful and widely applicable technique for the compression of deep learning models. The main idea of knowledge distillation is to transfer knowledge from a large teacher model to a small student model, where the attention mechanism has been intensively explored in regard to its great flexibility for managing different teacher-student architectures. However, existing attention-based methods usually transfer similar attention knowledge from the intermediate layers of deep neural networks, leaving the hierarchical structure of deep representation learning poorly investigated for knowledge distillation. In this paper, we propose a hierarchical multi-attention transfer framework (HMAT), where different types of attention are utilized to transfer the knowledge at different levels of deep representation learning for knowledge distillation. Specifically, position-based and channel-based attention knowledge characterize the knowledge from low-level and high-level feature representations respectively, and activation-based attention knowledge characterize the knowledge from both mid-level and high-level feature representations.

Why is it important?

Hierarchical Multi-Attention Transfer for Knowledge Distillation

Perspectives

Multi-Attention-based Knowledge Distillation
Jianping Gou
Southwest University

This page is a summary of: Hierarchical Multi-Attention Transfer for Knowledge Distillation, ACM Transactions on Multimedia Computing Communications and Applications, October 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3568679.
You can read the full text:

Read

Contributors

The following have contributed to this page

Jianping Gou
Southwest University

Hierarchical Multi-Attention Transfer for Knowledge Distillation

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Hierarchical Multi-Attention Transfer for Knowledge Distillation

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management