Zhipu AI Launches ComputerRL, Scaling AI Agents with Hybrid API-GUI Desktop Automation

In the evolving world of AI-driven automation, Zhipu AI rolled out ComputerRL, a framework that equips agents to handle and manipulate complex computing environments. This development tackles a long-standing gap in AI agent design—the disconnect between programmatic logic and GUIs built for human users. By mixing API calls with direct GUI controls, ComputerRL streamlines desktop tasks and moves agents closer to operating on their own.

Standard GUI-based agents may slow down when mimicking clicks and keystrokes on interfaces made for people. ComputerRL’s API-GUI model blends structured API invocations with GUI-driven steps. Agents call APIs for precise functions and revert to GUI actions when needed, offering a flexible route through full desktop workflows.

An LLM-powered process handles API creation automatically. After clients submit sample tasks, the platform identifies requirements, writes API code using Python libraries, and generates validation tests. The outcome is a set of reusable APIs that cut repetitive steps and boost efficiency. Ubuntu applications such as GIMP and LibreOffice gain integrated APIs for image editing and document layout, reducing sequences of clicks to a few programmatic calls.

Training virtual desktop agents often runs into slowdowns from resource-hungry setups. ComputerRL overcomes this with a distributed reinforcement learning system built on Docker containers and gRPC, running thousands of Ubuntu VMs in parallel. It integrates with benchmarks like AgentBench and improves on earlier platforms that struggled with heavy resource demands and network delays.

Key system components include qemu-in-docker for rapid VM startup, multi-node clusters for easy scaling, and a browser interface for real-time status updates. Using the AgentRL toolkit, the platform separates data gathering from model updates to allow fully asynchronous training. Support for large workloads comes from dynamic batch sizing and techniques that reduce off-policy bias, keeping agents improving through extended training runs.

To tackle entropy collapse, ComputerRL introduces Entropulse. This approach alternates reinforcement learning phases with supervised fine-tuning on top-performing rollout samples, reintroducing randomness and preventing agents from settling into repetitive behaviors.

The workflow begins with behavior cloning (BC) from trajectories generated by multiple LLMs, creating an initial policy. Next, step-level Group Relative Policy Optimization (GRPO) applies rule-based rewards, assigning points only to correct actions in successful runs. Entropulse then gathers diverse, high-quality data from previous rollouts for supervised fine-tuning, slowing premature convergence and expanding effective learning steps.

Researchers tested ComputerRL using open-source models such as GLM-4-9B-0414 and Qwen2.5-14B, producing AutoGLM-OS variants. In the OSWorld benchmark, which runs interactive Ubuntu tasks, AutoGLM-OS-9B posted a 48.1% success rate, ahead of OpenAI’s CUA o3 at 42.9% and Claude 4.0 at 30.7%. On the OSWorld-Verified suite, it reached 47.3%.

Ablation tests show the strength of the API-GUI approach, boosting success by 134% over GUI-only agents, especially in office and professional scenarios. BC alone provided a 31.9% baseline, and further training with Entropulse pushed results to 45.8%. Analysis of entropy curves confirms Entropulse’s role in sustaining agent curiosity.

Practical trials cover tasks such as generating sales summary tables in LibreOffice Calc and producing system reports via Terminal commands. Error breakdown attributes 25.8% of failures to vision-related issues and 34.4% to coordinating across multiple applications, outlining clear targets for refinement.

Looking ahead, ComputerRL lays groundwork for agents that adapt to changing work environments and tackle longer task sequences. Future developments may broaden training scenarios, add multimodal perception, and introduce hierarchical planning. Safety measures such as permission checks and action validation will be important for real-world deployments.

ComputerRL represents an important advance in desktop AI by combining scalable reinforcement learning with a hybrid API-GUI strategy. As open-source variants like AutoGLM-OS evolve, this framework moves closer to versatile, generalist agents for everyday computing.