What is it about?
This research focuses on bridges the gap between visual business plans and executable computer systems. In the business world, processes such as onboarding a new employee or processing a customer order, are almost always mapped out visually using flowcharts and standardized diagrams. While these diagrams are easy for human managers to read, computers cannot automatically understand or execute them. Traditionally, turning these pictures into structured text or working software requires slow, manual programming. This work introduces a smarter, automated approach using Vision-Language Models (VLMs), AI systems designed to read images and understand text simultaneously. Instead of just treating a business diagram as a static image, the AI is trained to analyze the visual layout, recognize how different steps are connected, and automatically convert the diagram into highly structured, machine-readable data. To achieve this, we have developed a specialized pipeline that guides the AI to systematically look at the diagram’s shapes, labels, and arrows. By translating complex visual workflows into precise digital structures, this technology allows organizations to automate their business processes directly from a simple drawing, drastically reducing human error, lowering development costs, and speeding up digital transformation.
Featured Image
Photo by Luke Jones on Unsplash
Why is it important?
This work is highly significant because it solves a major bottleneck in digital transformation and corporate automation. While the world runs on software, companies still plan their operations using pictures (flowcharts, diagrams, and sketches). Right now, there is a massive, costly disconnect between how humans design workflows and how computers execute them. Here is exactly why this research matters to the industry and the academic community: 1. Eliminating the "Manual Translation" Bottleneck Currently, when a business analyst designs a new process diagram, a software developer must sit down, look at the picture, and manually code it into a system. This human translation is incredibly slow, expensive, and prone to misinterpretation. Automating this step means moving from a drawing to a working digital structure in seconds rather than days. 2. Pushing the Limits of Vision-Language Models (VLMs) Most AI models are great at reading standard text or identifying simple objects in a photo (like a cat or a car). However, reading a business diagram requires complex spatial reasoning; the AI has to understand that an arrow pointing from Box A to Box B means "do this step next," and a diamond shape means "make a decision." This research advances how AI interprets complex, abstract visual structures. 3. Democratizing Automation (No-Code/Low-Code) By allowing an AI to accurately convert a visual drawing into machine-readable structure, it empowers non-technical business professionals to build and deploy automated workflows. Anyone who can draw a flowchart can essentially create a system, vastly reducing the reliance on scarce engineering talent. 4. Accelerating Legacy Migration Thousands of enterprises have libraries of old, documented process diagrams stored as static PDFs or images. This technology provides a scalable way to automatically scan, digitize, and catalog those old archives into modern, searchable, and executable digital assets.
Perspectives
Humans can look at a sketchy flowchart and instantly grasp the underlying business logic, but to an AI, that same image is just a chaotic grid of pixels, lines, and overlapping text. Diving into this project allowed us to really test the limits of Vision-Language Models when faced with deep structural reasoning. The real 'aha!' moment came when we moved away from expecting the model to magically understand the diagram in one giant leap, and instead designed a structured approach that guides the AI to trace the shapes, labels, and arrows methodically, just like a human analyst would. Seeing the system successfully output precise, structured data from a visual workflow for the first time was incredibly thrilling. I hope this paper sparks fresh ideas on how we can transition AI from a simple tool into a deeply reasoning partner for digital transformation.
Pritam Deka
Queen's University Belfast
Read the Original
This page is a summary of: Structured Extraction from Business Process Diagrams Using Vision-Language Models, March 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3748522.3779780.
You can read the full text:
Contributors
The following have contributed to this page







