What is it about?
This work explains how we developed an AI system that can investigate the authenticity of online photos and videos. Today, many misleading posts utilize real footage in an incorrect context or AI-altered images that appear convincing at first glance. Human fact checkers do careful work, but they cannot keep up with the volume or speed of social media. Our approach utilizes several AI agents that work together like a small investigative team. When someone submits a piece of visual content, the system begins by asking basic questions about it: where it was recorded, when it occurred, and what claim is being made. The agents then plan a step-by-step inquiry. Some agents perform reverse image or video search, others read reliable websites, and others compare the content with trusted reports or past events. Each agent contributes evidence, and the system compiles that evidence into a clear explanation of whether the content is authentic, miscaptioned, out of context, or likely manipulated. We demonstrate the system on challenging real-world examples, including crisis and conflict footage, where timing, location, and source must be confirmed with care.
Featured Image
Photo by Sander Sammy on Unsplash
Why is it important?
In today's digital landscape, visual misinformation spreads rapidly and can have serious real-world consequences. Existing verification methods typically focus either on detecting technical manipulations (like deepfakes) or checking context separately, but not both together effectively. Our system addresses this critical gap by combining large language AI models with specialized verification tools in a systematic approach. What makes this work unique is the Deep Researcher Agent's ability to extract spatial, temporal, attribution, and motivational context from verified sources, providing comprehensive evidence-based assessments. This is particularly timely as we face increasingly sophisticated misinformation techniques, including both technical deepfakes and "cheapfakes," where genuine media is repurposed in a misleading manner. By bridging automated efficiency with thorough evidence gathering, this work contributes to maintaining information integrity and helping fact-checkers, journalists, and platforms combat multimedia misinformation more effectively at scale.
Perspectives
As an educator and researcher working at the intersection of AI and information integrity, I find this work particularly meaningful because it addresses one of the most pressing challenges in our digital age. Having witnessed how visual misinformation can rapidly influence public opinion and decision-making, developing AI systems that can systematically verify multimedia content feels like contributing to a more trustworthy information ecosystem. What excites me most about this research is how it combines the reasoning capabilities of large language models with structured verification tools, creating a system that not only detects manipulation but also provides evidence-based explanations. This transparency is crucial for building trust in AI-assisted fact-checking. Looking forward, I envision this work evolving toward contestable AI systems that preserve human oversight while maintaining efficiency, ensuring that verification tools serve as aids to human judgment rather than replacements for it.
Hung Nguyen
University of New Brunswick
Read the Original
This page is a summary of: Multimedia Verification Through Multi-Agent Deep Research Multimodal Large Language Models, October 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3746027.3762033.
You can read the full text:
Resources
Contributors
The following have contributed to this page







