What is it about?

CenterCLIP achieves state-of-the-art text-video retrieval performance and decent computation cost reduction on MSVD, MSRVTT, LSMDC, and ActivityNet through performing multi-segment token clustering on video tokens in the vision transformer of CLIP.

Featured Image

Read the Original

This page is a summary of: CenterCLIP, July 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3477495.3531950.
You can read the full text:

Read

Contributors

The following have contributed to this page