What is it about?
CenterCLIP achieves state-of-the-art text-video retrieval performance and decent computation cost reduction on MSVD, MSRVTT, LSMDC, and ActivityNet through performing multi-segment token clustering on video tokens in the vision transformer of CLIP.
Featured Image
Read the Original
This page is a summary of: CenterCLIP, July 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3477495.3531950.
You can read the full text:
Contributors
The following have contributed to this page







