What is it about?

Operating systems have long evolved their interfaces to serve different kinds of users. Command-line interfaces were designed for experts who could remember commands and type instructions precisely. Graphical user interfaces made computers accessible to general users by relying on visual recognition, menus, buttons, and step-by-step interaction. Today, large language model powered agents are emerging as a new class of computer users, but existing operating system interfaces were not designed for them. LLM agents have very different strengths and weaknesses from humans. They can reason over large contexts, plan tasks, and generate structured commands. However, they are weak at visual perception, slow when forced into repeated observe-and-act loops, sensitive to long action sequences, and costly in terms of latency and tokens. As a result, when LLM agents use today’s GUIs, they often underperform. A simple task may require many clicks, visual checks, menu selections, and corrective actions, each of which creates new chances for failure. This work proposes DMI, a declarative operating system interface abstraction for LLM-powered computer-use agents. Instead of forcing agents to operate GUIs imperatively, step by step, DMI lets agents declare what they want to achieve. The system then handles the underlying mechanisms of navigation and interaction. DMI applies core systems principles to make OS interfaces more suitable for LLM agents: declarative rather than imperative interaction, separation of policy from mechanism, and fast-path/slow-path execution. In this way, DMI provides an API-like way to interact with existing GUI applications, without requiring application source-code changes or public APIs

Featured Image

Why is it important?

LLM-powered computer-use agents are expected to operate real applications on behalf of users, but today’s operating system interfaces create a major bottleneck. Existing GUIs were designed for humans, not for agents. They rely on visual recognition, repeated observation, and long sequences of low-level actions. These design choices work well for people, but they amplify the weaknesses of LLMs: weak visual perception, high latency, token cost, and imperfect long-horizon execution. This work is important because it identifies this interface mismatch as a systems problem, not merely a model capability problem. Instead of asking future models to become better at clicking through human-oriented GUIs, it shows that operating systems can evolve new interfaces for this new class of users. By separating policy from mechanism, DMI allows agents to focus on what should be done, while the system handles how to navigate and interact with GUI applications. This can make computer-use agents more reliable, efficient, and practical. In our evaluation on Windows productivity applications, DMI substantially improves task success rates and reduces the number of LLM calls needed to complete tasks. More broadly, this work points toward a future where operating systems provide LLM-friendly interfaces, enabling agents to use existing applications in an API-like, declarative way without requiring developers to rewrite applications or expose public APIs.

Perspectives

We see this work as part of a broader shift in operating system interfaces. In the past, command-line interfaces served expert users, and graphical interfaces made computers accessible to general users. Now, LLM agents are emerging as a new kind of “user” with different strengths and weaknesses. This paper explores what an OS interface could look like if it were designed for LLMs rather than humans. Our view is that future systems should not simply ask agents to click through existing GUIs. Instead, they should provide declarative abstractions that allow agents to express intent directly and leave deterministic interface mechanics to the system

Yuan Wang
Institute of Software, Chinese Academy of Sciences

Read the Original

This page is a summary of: From Imperative to Declarative: Towards LLM-friendly OS Interfaces for Boosted Computer-Use Agents, April 2026, ACM (Association for Computing Machinery),
DOI: 10.1145/3767295.3803576.
You can read the full text:

Read

Resources

Contributors

The following have contributed to this page