What is it about?

In “CapST” we go beyond real/fake detection to pinpoint which AI model created a deepfake—crucial for forensic tracing and defense. Our CapST framework combines Capsule Networks, temporal attention, and a streamlined VGG19 to accurately attribute deepfakes while remaining computationally efficient.

Featured Image

Why is it important?

The CapST model addresses a critical gap in the field of deep-fake video forensics by focusing not just on detecting whether a video is fake, but also identifying the specific generative model used to produce the fake. This is crucial for forensic investigations, as knowing the source model helps trace back the origin and understand the techniques used, enabling better legal accountability and defensive strategies. Additionally, CapST offers significant improvements in accuracy while maintaining computational efficiency, which makes it practical for real-world deployment.

Perspectives

From a forward-looking perspective, CapST sets a foundation for more advanced, scalable, and generalizable deep-fake attribution methods. Its modular and lightweight architecture opens the door for deployment on edge devices and integration into broader digital media authentication systems. Furthermore, as deep-fake generation technologies evolve, future research can build on CapST to handle even more subtle and complex forgeries, extend beyond face-swapping, and support multimodal detection (e.g., combining visual and audio cues).

Dr Wasim Ahmad
Academia Sinica

Read the Original

This page is a summary of: CapST: Leveraging Capsule Networks and Temporal Attention for Accurate Model Attribution in Deep-fake Videos, ACM Transactions on Multimedia Computing Communications and Applications, April 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3715138.
You can read the full text:

Read

Contributors

The following have contributed to this page