Research

OpenCUA vs Claude Computer Use 2026

OpenCUA-72B hits 45.0% on OSWorld-Verified, matching Claude 4 Sonnet. Open-source closes the computer-use gap with Anthropic and OpenAI. Snapshot for 2026-05-15.

By Ramanath, CTO & Co-Founder at Presenc AI · Last updated: May 2026

What this is

Open-source computer-use agents reached rough parity with the proprietary frontier in 2026. OpenCUA-72B, from the xlang-ai group, hit 45.0% on OSWorld-Verified, comparable to Claude 4 Sonnet, and a state-of-the-art on UI-Vision at 37.4%. This page is a 2026-05-15 head-to-head snapshot.

Benchmark Comparison

ModelOSWorld-VerifiedScreenSpot-ProUI-Vision
OpenCUA-72B45.0% (SOTA open)60.8%37.4% (SOTA)
OpenCUA-32BSurpassed GPT-4o-based CUA~55%~32%
Claude 4 Sonnet (computer use)~46-48%~62%n/a
OpenAI CUA (GPT-4o-based)~38%~52%~28%
OpenAI GPT-5.5 CUA~49%~64%~36%
Gemini 2.5 (computer use)~40%~57%~31%

Architectural Differences

DimensionOpenCUAClaude Computer Use
LicenseOpen weights + open frameworkProprietary API
Training corpusAgentNet (22,600+ task demonstrations)Undisclosed
Reasoning approachChain-of-thought "inner monologue"Implicit reasoning + tool calls
OS coverageWindows, macOS, UbuntuSandbox VM + bring-your-own host
App breadth200+ applications + websitesAnything visible to display
Default deploymentSelf-host (HuggingFace, local)Anthropic API
Tool action setcomputer + browser + system actionscomputer_use_20251124 (with zoom)

AgentNet Dataset (OpenCUA's training corpus)

AttributeValue
Task demonstrations22,600+
Operating systemsWindows, macOS, Ubuntu
Applications + websites covered200+
InnovationTrajectories augmented with chain-of-thought

Six Things the Data Tells You

  1. Open weights matched Claude 4 Sonnet on computer use. The OpenCUA-72B / Claude 4 Sonnet gap on OSWorld-Verified is within noise.
  2. Open weights still trail GPT-5.5 on computer use by ~4 points on OSWorld-Verified.
  3. The CoT "inner monologue" trick is the OpenCUA innovation. Open frameworks now have a published recipe for hitting frontier-level computer-use performance.
  4. AgentNet (22.6K demonstrations) is the open data asset to beat. Comparable proprietary datasets are not publicly disclosed.
  5. Enterprise on-prem can ship now. OpenCUA-72B running on a single multi-GPU host is competitive with Claude 4 Sonnet for many computer-use tasks.
  6. The proprietary frontier still wins on UX and safety. Anthropic and OpenAI ship safer defaults, better tool-use rate-limiting, and clearer terms.

What This Means for AI Visibility

If open-source computer-use agents reach Claude 4 Sonnet parity, they will be the surface enterprises deploy for on-prem AI workflows, including agentic commerce. Brands that want to be reachable to these agents need to test agent-reachability against both proprietary (Claude, ChatGPT, Gemini) and open agent stacks (OpenCUA, browser-use), because the actual install base diverges.

Methodology

Benchmark figures sourced from VentureBeat's OpenCUA coverage, the OpenCUA project page, the OpenCUA arXiv paper, and the OpenCUA GitHub repository. Claude / OpenAI / Gemini comparison figures cross-checked against Coasty's computer-use agent comparison.

How Presenc AI Helps

Presenc AI runs agent-reachability tests against both proprietary computer-use agents (Claude, GPT-5.5, Gemini) and open-source stacks (OpenCUA, browser-use). Brands that need consistent agent reachability across the install base — not just the surface their team uses internally — get a true picture of how they appear inside agentic workflows.

Frequently Asked Questions

OpenCUA-72B scores 45.0% on OSWorld-Verified, comparable to Claude 4 Sonnet and within noise. It is SOTA among open-source models, and beats OpenAI's earlier GPT-4o-based CUA. It still trails GPT-5.5 by ~4 points.
OpenCUA's training corpus: 22,600+ computer-use task demonstrations across Windows, macOS, and Ubuntu, covering 200+ applications and websites. Trajectories include chain-of-thought reasoning, which is the key innovation enabling open-source parity.
Claude Computer Use if you want managed infrastructure, safety guardrails, and the latest tool-version updates. OpenCUA if you need on-prem deployment, open licensing, or do not want to depend on Anthropic's pricing. Performance is now close enough that the choice is mostly about deployment posture, not capability.
Yes, weights and framework code are released openly under the xlang-ai/OpenCUA GitHub repository. The AgentNet dataset is also publicly released as part of the project.

Track Your AI Visibility

See how your brand appears across ChatGPT, Claude, Perplexity, and other AI platforms. Start monitoring today.