Most "AI Agents" Are Workflows With Marketing
"Agent" became the dominant marketing label for AI products in 2025-2026. The label is largely meaningless: products labelled "AI agents" range from genuinely autonomous Tier 4 systems to deterministic workflows with LLM-filled slots. This page proposes a capability matrix and applies it honestly to leading vendor products in 2026.
Key Findings
- Across a sample of 60 vendor products marketed as "AI agents", roughly 70 percent fail at least three of five agent-defining capability tests, the practical implication is that "agent" labelling is unreliable.
- The most-violated criteria are dynamic planning (52 percent fail) and meaningful error recovery (48 percent fail), the most-met is tool use (89 percent meet).
- True agent capability correlates strongly with operating expense and deployment complexity; workflow products that label themselves as agents underdeliver on outcome but are easier to deploy.
- Buyer confusion is real: enterprise procurement teams in 2026 are paying agent-tier prices for workflow-tier products at meaningful rates.
- The capability matrix below provides a five-question test that distinguishes agents from workflows quickly.
The Five Capability Criteria
1. Dynamic Tool Selection
Does the system pick which tool to call based on context, or are tool calls hard-coded? Genuine agents pick tools; workflows pre-script them.
Test: present an ambiguous request that could be served by any of three tools. Does the system pick correctly across multiple ambiguous variants?
2. Multi-Step Planning
Does the system plan multi-step actions toward a goal, or follow a pre-defined sequence? Genuine agents plan; workflows execute fixed pipelines.
Test: ask for an outcome reachable by multiple step orderings. Do different runs produce different valid orderings?
3. Cross-Step Memory
Does the system carry state across steps, including state derived from intermediate tool results? Genuine agents remember; workflows often pass forward only what was specified.
Test: ask the system to recall a fact from step 1 in step 5 without explicitly passing it through the workflow.
4. Error Recovery
Does the system detect failures and adapt, or does it fail-stop? Genuine agents recover; workflows typically halt on error.
Test: cause a tool to fail (return an error). Does the system retry, switch tools, or alert correctly?
5. Goal Persistence
Does the system maintain a goal across long-horizon work, or operate turn-by-turn? Genuine agents persist; chatbots and workflows do not.
Test: ask for an outcome requiring 3+ turns of refinement. Does the system push forward against the goal across turns?
Capability Matrix: 12 Vendor Products Honestly Classified
| Product | Tool select | Planning | Memory | Recovery | Goal persist | Verdict |
|---|---|---|---|---|---|---|
| Claude Code (autonomous) | Yes | Yes | Yes | Yes | Yes | Agent (Tier 4) |
| OpenAI Codex agent | Yes | Yes | Yes | Yes | Yes | Agent (Tier 4) |
| Devin | Yes | Yes | Yes | Partial | Yes | Agent (Tier 4) |
| Operator | Yes | Yes | Yes | Partial | Yes | Agent (Tier 4) |
| Cursor Agent | Yes | Partial | Yes | Yes | Yes | Agent (Tier 3-4) |
| Salesforce Agentforce (typical config) | Yes | Partial | Partial | Partial | Yes | Tier 3 agent |
| Microsoft Copilot Studio agents | Yes | Partial | Yes | Partial | Yes | Tier 3 agent |
| Zapier Agents (typical use) | Sometimes | No | Limited | No | No | Workflow |
| Intercom Fin (default mode) | Yes | No | Session-only | Limited | Limited | Tier 1-2 |
| n8n AI agent nodes | Yes | No | Limited | No | No | Workflow with LLM |
| Most "AI SDR agents" (typical) | Limited | No | CRM-stored | No | Limited | Workflow |
| Most "AI recruiter agents" (typical) | Limited | No | ATS-stored | No | Limited | Workflow |
Why The Distinction Matters For Buyers
Workflows and agents have different operational profiles:
- Workflows: predictable, lower-risk, easier to debug, lower cost per execution, higher reliability on in-distribution tasks. Fail visibly when out-of-distribution.
- Agents: handle novel situations, higher cost per execution, harder to debug, lower per-execution reliability but higher coverage of edge cases. Fail invisibly more often.
Buying an agent for a workflow problem wastes money on capability that does not help and adds failure modes. Buying a workflow for an agent problem caps your ceiling at the deterministic flow design.
The Buyer Five-Question Test
- Show me three different runs of the same task. Are the action sequences different and all valid?
- What happens when a tool returns an error mid-task?
- Can the system reference information from earlier in the task that I did not explicitly pass forward?
- Can it complete a task that requires tools the vendor did not anticipate at design time?
- What is the end-to-end latency on a 10-step task? (Workflows are fast; agents take 30 seconds to many minutes.)
If the vendor cannot demonstrate three or more of these affirmatively, the product is a workflow regardless of marketing.
Brand Visibility Implications
The distinction matters for brand-visibility strategy. Workflow systems recommend brands from pre-defined lists or RAG corpora; the brand-presence battle is information-architecture-driven (be in the corpus, be in the integration manifest). Agent systems recommend brands dynamically through tool calls and reasoning; the brand-presence battle is search-engine-style (be findable through the tools the agent uses, be present in training data the agent relies on). Brands should target both surfaces but with different tactics.
Methodology
Capability matrix derived from public vendor product documentation, demo videos, third-party reviews, and Presenc AI deployment instrumentation across 60+ enterprise agent / workflow customers. Vendor classifications are subjective judgements based on default-configuration product behaviour; many products can be configured to higher capability with custom development, the table reflects typical out-of-the-box behaviour. Vendor disagreement with classifications is expected and welcome. Updated quarterly.
How Presenc AI Helps
Presenc AI's deployment-side instrumentation observes whether agent and workflow products actually exercise dynamic tool selection, planning, and recovery in production traces, distinguishing real agent behaviour from workflow patterns at the trace level. For procurement teams evaluating "agent" claims, this is the operational test that vendor demos cannot fake.