What is it about?
OmniGIRL is a multilingual & multimodal GitHub-issue-resolution benchmark with 959 tasks spanning four programming languages. Inputs may include text, screenshots, rendered web pages and other modalities.
Featured Image
Why is it important?
Key Features - Convenient, Standardized Evaluation Environment Provide Pre-built Docker images, significantly simplifying the environment setup process and guaranteeing the consistency and reproducibility of evaluation tests. - Extensive Programming Language Coverage Support Python, Java, JavaScript, and TypeScript, ensuring effective evaluation across these four major programming language ecosystems. - Rich Multimodal Input Data Integrate diverse modalities (text, web content, and images), requiring evaluated models to understand and leverage information from all sources to effectively resolve issues. - Automatic Environment Setup & Dataset Construction Tool We introduce SWE-Factory, an automatic issue-resolution benchmark construction pipeline based on a multi-agent framework. For more information and the full source code.
Read the Original
This page is a summary of: OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution, Proceedings of the ACM on Software Engineering, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3728871.
You can read the full text:
Resources
Contributors
The following have contributed to this page







