SRAL: A Framework for Evaluating Agentic AI Architectures

Sharan, Aakash

When the Description Becomes the Routing Table

2026-06-22 6 min read

Salesforce put Multi-Agent Orchestration into its Summer '26 release. That matters less because Salesforce did it, and more because Salesforce sits inside a very large enterprise install base. A vast number of companies are about to put multiple agents, coordinating with each other, into production at the same time. As of Salesforce's Q1 FY27 results, Agentforce had reached $1.2 billion in ARR, up 205% year over year. Whatever breaks in multi-agent coordination is about to break at scale, and in public.

So it is worth looking closely at how the coordination actually works.

How the routing works

When a request comes in, the Atlas Reasoning Engine decides which specialist agent should handle it. Salesforce describes the mechanism plainly: the engine reviews each agent's description, instructions, and available actions to determine which specialist agent is best equipped for the job. Each specialist agent carries a description, instructions, and available actions. The router reads that text and picks.

This is not a fixed decision tree, and Salesforce is right to point that out. Decision trees are brittle. But look at what replaced it. The decision about where your request goes is now a language model interpreting natural-language descriptions, and the quality of the routing depends on the quality of the wording.

The description is the routing table.

Why free text is a risky routing table

A routing table is supposed to be the most boring, most deterministic part of a system. Free text is neither. It is untyped. Two subagents with overlapping descriptions create an ambiguous route, and nothing flags the ambiguity until a request lands in the wrong place. There is no signature to check, no compile-time error, no test that asserts "a refund request routes to the refund agent" the way you would unit-test a function. You find out in production, from the output, after the wrong agent has already acted.

This is the part that gets lost under the launch coverage, and it cuts both ways, so it is worth being fair about.

Salesforce clearly knows structure matters. Alongside the reasoning engine, they have described a more deterministic layer through Agent Graph and Agent Script, where graph topology, state, transitions, and lifecycle hooks become explicit. Their engineering framing points in the same direction: when behavior needs control, move it into graph topology, finite-state transitions, lifecycle hooks, and explicit logic. They also use the A2A Agent Card, a versioned JSON contract that describes an agent's capabilities for discovery and version negotiation.

So the structured machinery exists. They built contracts for discovery and determinism for execution. What they left probabilistic is the selection decision itself. That is a deliberate tradeoff for flexibility, not an oversight. It is also exactly the place to watch, because the selection decision is the one that determines whether all of that downstream determinism even runs on the right agent.

The part distributed systems already solved

We have seen this shape before, and we stopped building it on purpose.

You do not route service calls by having one process read free-text descriptions of other services and guess which one to call. You use a registry, typed interfaces, and explicit contracts, because routing is the part of a distributed system that fails quietly and expensively. Service discovery by reading README files is precisely the thing the industry spent years engineering away from. I make the general version of this argument in The Harness Is the Control Plane: once an agent can act, you have built a distributed system, and its failures are distributed-systems failures.

Two specific failure modes carry straight over from distributed systems into multi-agent orchestration.

The first is shared state. Orchestration implies multiple agents reading and writing shared context. A recent paper that frames multi-agent memory as a computer-architecture problem names memory consistency across agents as the most pressing open challenge, and reaches for the obvious precedent: cache coherence. Most teams are shipping the shared cache without the invalidation protocol that makes a cache safe. A cache without invalidation is not memory. It is a liability with low latency.

The second is blast radius. In March, an internal Meta agent asked to answer one engineer's question instead posted publicly, without the expected approval step, and a colleague acted on the wrong answer. For roughly two hours, people without clearance had access to sensitive data. Meta rated it SEV-1. There was no attacker. The agent simply acted across a boundary it should not have had. That is a confused-deputy failure, and it is what under-specified coordination produces once the agents carry real permissions.

What to test

The first test is not whether every specialist agent works in isolation. It is whether the same request routes to the same agent under small wording changes.

Test overlap between descriptions. Test negative cases. Test permission boundaries after misrouting. Treat description changes like infrastructure changes, because that is what they are.

If a description change can change where work goes, it deserves review, versioning, and regression tests.

The hard part did not go away

None of this means multi-agent orchestration is a mistake, or that Salesforce built it badly. It means the hard part was never getting agents to talk to each other. The hard part is the one distributed systems have always had: deciding who handles what, keeping shared state coherent, and bounding what each actor can do when it is wrong.

A reasoning engine reading descriptions makes the first of those problems disappear from the demo. It does not make it disappear from production. It relocates it into a place most teams are not looking, the wording of a description field, and does not give them a test that would catch it.

When you route by description, the description is infrastructure. Most teams are still writing it like marketing copy.

Coordination was always the hard part. The model did not change that. It moved it somewhere you are not testing.

Sources

Salesforce, Multi-Agent Orchestration product page (how Atlas reviews each agent's description, instructions, and actions to route): https://www.salesforce.com/agentforce/multi-agent-orchestration/
Salesforce, Summer '26 release announcement: https://www.salesforce.com/news/stories/summer-2026-product-release-announcement/
Salesforce Engineering, "Inside the Brain of Agentforce: the Atlas Reasoning Engine": https://engineering.salesforce.com/inside-the-brain-of-agentforce-revealing-the-atlas-reasoning-engine/
Salesforce Engineering, "Agentforce's Agent Graph: Toward Guided Determinism with Hybrid Reasoning": https://engineering.salesforce.com/agentforces-agent-graph-toward-guided-determinism-with-hybrid-reasoning/
Salesforce, Agent2Agent (A2A) protocol and Agent Card: https://www.salesforce.com/agentforce/ai-agents/agent2agent-protocol/
Salesforce, Q1 FY27 earnings (Agentforce ARR $1.2B, +205% YoY): https://investor.salesforce.com/files/doc_financials/2027/q1/CRM-Q1-FY27-Earnings-Press-Release.pdf
Yu et al., "Multi-Agent Memory from a Computer Architecture Perspective: Visions and Challenges Ahead" (arXiv 2603.10062): https://arxiv.org/abs/2603.10062
TechCrunch, "Meta is having trouble with rogue AI agents" (March 18, 2026): https://techcrunch.com/2026/03/18/meta-is-having-trouble-with-rogue-ai-agents/

Vendor-reported product and financial figures are treated as directional. Detailed verification notes are kept separately.