Learning From Expert: Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval

Lina Sun; Yewen Li; Yumin Dong

doi:10.1145/3591106.3592242

What is it about?

We propose an effective unsupervised crossmodal hashing retrieval method, called Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval (VLKD). VLKD uses the vision-language pre-training (VLP) model to encode features on multi-modal data, and then constructs a similarity matrix to provide soft similarity supervision for the student model. It distils the knowledge of the VLP model to the student model to gain an understanding of multi-modal knowledge. In addition, we designed an end-to-end unsupervised hashing learning model that incorporates a graph convolutional auxiliary network. The auxiliary network aggregates information from similar data nodes based on the similarity matrix distilled by the teacher model to generate more consistent hash codes. Finally, the teacher network does not require additional training, it only needs to guide the student network to learn high-quality hash representation, and VLKD is quite efficient in training and retrieval.

Photo by Claudio Schwarz on Unsplash

Why is it important?

Sufficient experiments on three multimedia retrieval benchmark datasets show that the proposed method achieves better retrieval performance compared to existing unsupervised cross-modal hashing methods, demonstrating the effectiveness of the proposed method.

Perspectives

I hope this article makes a little contribution to the field of cross -model retrieval.
Lina Sun

This page is a summary of: Learning From Expert: Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval, June 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3591106.3592242.
You can read the full text:

Read

Resources

Open Access version
Learning From Expert: Vision-Language Knowledge Distillation for Unsupervised Cross-Modal Hashing Retrieval
related publications

Contributors

The following have contributed to this page

Lina Sun

Cross-Modal Hashing Retrieval

What is it about?

Why is it important?

Perspectives

Resources