The clearest signal yet that mid-market businesses do not need a frontier subscription per seat arrived May 7, 2026. Sakana AI introduced what the company calls the RL Conductor — a 7-billion-parameter open model trained via reinforcement learning to orchestrate a pool of much larger language models, deciding for each query which combination of GPT-5, Claude Sonnet 4, Gemini 2.5 Pro, or open-weight workers should handle it. According to VentureBeat's reporting by Ben Dickson, the small Conductor outperformed every individual frontier model it routed to — at a fraction of the cost and with far fewer API calls than competing multi-agent pipelines.
If that sentence sounded recursive, it is. A small open model — itself the size of a model that runs on a single high-end GPU — out-routed expensive human-designed multi-agent systems and beat every individual frontier provider on the benchmark composite. Co-author Yujin Tang told VentureBeat the bottleneck Sakana was attacking was the rigidity of human-built pipelines: “While using frameworks with hard-coded pipelines like LangChain and Mixture-of-Agents can work well for specific use cases … in production, an inherent bottleneck arises when targeting domains with large user bases with very heterogeneous demands.” The fix Sakana proposed is the fix mid-market AI procurement has been edging toward for a year: stop buying one model. Buy a routing layer that buys the right model for each request.
This piece is the next chapter after our earlier ModelRelay cost-optimization analysis, where we made the case that smart routing was already cutting AI bills 10–20× for mid-market deployments. The Sakana paper is the proof that the routing layer is now smart enough to be a small open model itself — and the implication for a 50-to-250-person business in Fort Wayne, Auburn, or anywhere across Northeast Indiana is direct: a $200K-per-year multi-model AI Employee bill on a single-vendor frontier subscription collapses to a much smaller number on a router-fronted multi-vendor stack, with capability that exceeds the single-vendor plan.
Key Takeaways
- Sakana AI's RL Conductor — a 7B open model — outperformed GPT-5, Claude Sonnet 4, and Gemini 2.5 Pro acting alone on a benchmark composite by routing each query to the best combination of models, scoring 77.27% average across tasks (93.3% on AIME25, 87.5% on GPQA-Diamond, 83.93% on LiveCodeBench).
- The Conductor used an average of 1,820 tokens per question versus 11,203 tokens for a baseline Mixture-of-Agents setup — a roughly six-fold token reduction at higher quality. That ratio is the cost line for mid-market budgets.
- The architectural lesson is the headline: the routing layer is the most valuable component in a multi-model stack. Mid-market businesses do not need a frontier-tier subscription per seat; they need a router that decides per-query which model to call.
- For a typical 50–250-employee Northeast Indiana firm, the practical translation is a Secure AI Gateway with multi-vendor access, behavior contracts written model-agnostically, and per-workload routing rules. The math collapses six-figure annual AI bills toward five figures while raising capability.
- Sakana's commercial product, Fugu, is in beta with two variants (Fugu Mini and Fugu Ultra) but the broader signal is what matters: the open-source community now has a credible path to building routers that match what a frontier vendor would charge a premium for. That commoditizes orchestration the same way GPU compute was commoditized.
What Did Sakana Actually Build, and Why Is the Small-Model-as-Router So Important?
The technical move in the RL Conductor paper, as VentureBeat described it, is structurally elegant. Sakana fine-tuned a 7-billion-parameter Qwen2.5-7B base model using reinforcement learning, with a single objective: design the right multi-step workflow for each incoming query, distribute the subtasks across a pool of seven worker LLMs (three closed-source frontier models — Gemini 2.5 Pro, Claude-Sonnet-4, GPT-5 — and four open-source workers, including DeepSeek-R1-Distill-Qwen-32B, Gemma3-27B, and Qwen3-32B), and minimize the cost of getting to a correct answer. The Conductor was constrained to workflows of up to five steps and was rewarded only for whether the final answer and output format were correct.
What emerged from that training run is the part that matters for procurement strategy. The Conductor didn't just learn to pick the best model for each query — it learned to construct customized workflows. For simple factual recall questions, it solved the problem in a single step or used a basic two-agent setup. For complex coding problems, it built workflows of up to four agents with dedicated planning, implementation, and verification phases. The Conductor frequently assigned Gemini 2.5 Pro and Claude Sonnet 4 as high-level planners on coding benchmarks and brought in GPT-5 only at the final implementation step. In the most striking finding, the article noted that the Conductor would sometimes “completely abdicate its own role, handing the entire planning process over to Gemini 2.5 Pro and allowing it to dictate the subtasks for the rest of the pool.”
The benchmark numbers describe a system that has internalized which tools to use when. The 7B Conductor scored an average 77.27% across all tasks, hitting 93.3% on the AIME25 math benchmark, 87.5% on GPQA-Diamond, and 83.93% on LiveCodeBench. Compared against the standard multi-agent pipelines it was tested against — MASRouter, Mixture-of-Agents (MoA), RouterDC, and Smoothie — the Conductor set new state-of-the-art numbers across the board. And it did so while burning an average of 1,820 tokens per question versus 11,203 tokens for the MoA baseline. That is roughly a six-fold cost reduction at higher quality, on a model small enough to run on a single GPU.
The structural reason this matters: it shows that the most valuable component in a multi-model AI stack is not the frontier model itself. It is the routing decision. And the routing decision, on this evidence, is now small-model-tractable.

Why Does Mid-Market AI Procurement Change When the Router Gets Cheap?
The default 2025-era assumption inside most mid-market AI proposals — including the proposals our prospective clients in Fort Wayne, Allen County, and DeKalb County have been bringing to Cloud Radix for review — was that capability lives in the frontier model. Pick the best model, pay the per-seat subscription, and the rest of the architecture is plumbing. The Sakana paper undermines that assumption directly. If a 7B open model can route between frontier providers and outperform every individual frontier on the composite, then the per-seat frontier subscription is at best a default and at worst a procurement error.
The math gets sharper when you translate the token-per-question reduction into business-workload terms. A mid-market AI Employee that handles, conservatively, 5,000 inference requests per day across research, drafting, scheduling logic, and customer-call dialogue uses something like 7–10 million tokens daily on a single-vendor frontier-only architecture. The Sakana data implies a router-fronted version of the same workload moves through closer to 1.5 million tokens of frontier billing per day — the cheaper steps live with smaller workers, and frontier models are called only at the steps where their capability matters. Even at unchanged per-token rates, that is a 5–6× cost cut on the frontier portion alone, before you factor in the open-weight workers that often replace frontier calls outright.
We covered the cost-side complement of this story in the DeepSeek-V4 procurement playbook — when frontier-class capability becomes available at one-sixth the per-token price, the buyer who already has multi-model architecture wired in captures the cut tomorrow; the buyer locked into a single vendor pays the spread. Sakana's RL Conductor work is the routing-layer half of the same story. Together they describe a single conclusion: the mid-market default in 2026 should be multi-model architecture, router-mediated, model-agnostic at the application layer.
The architectural baseline for capturing this benefit, as we covered in the AI operating layer analysis, is a Secure AI Gateway in front of an AI Employee whose behavior contract is written without reference to a specific model. That gateway is where the routing decision lives — whether the routing is rule-based, learned via reinforcement learning the way Sakana trained the Conductor, or a hybrid. The broader industry direction confirms the bet: independent benchmarking sites such as Artificial Analysis now publish cost-per-quality curves that show price-to-capability ratios moving down faster than at any prior twelve months in the industry, and the Stanford HAI 2026 AI Index documents the same trend at the industry level. A buyer's job is not to build the Conductor themselves. The buyer's job is to make sure their AI Employee architecture has the socket a Conductor-class router can plug into without rebuilding.
What Does a Router-Fronted AI Employee Architecture Look Like in Practice?
To make this concrete for a Northeast Indiana mid-market reader, here is the architectural diagram that increasingly defines how Cloud Radix builds AI Employees in 2026, expressed as a pipeline:
- Application boundary. The application — phone receptionist, document automation, lead-followup workflow — speaks to the AI Employee through a stable behavior contract. No model name, no vendor identity, no tokenizer-specific prompt. The contract names the inputs, the expected outputs, and the policy.
- Routing layer. A small model (today, this looks like a rules engine plus an embedding-based classifier; tomorrow it looks more like Sakana's RL Conductor) inspects the request and decides which workflow to run. The workflow may be one model call, two models in sequence, or a fan-out to several workers with a final synthesis step.
- Worker pool. A diverse set of models — at least one US proprietary frontier (Opus 4.7, GPT-5.5, Gemini 2.5/3.x), at least one US-aligned hosted alternative, at least one open-weight option that can run locally or on a friendly cloud, and where appropriate one or more specialty workers tuned for a domain. The pool includes the cheapest viable worker for each request type.
- Policy and credential layer. Sits between routing and worker calls. Enforces data classifications, audit logging, prompt translation, and credential isolation independent of which worker is called. The zero-trust credential isolation pattern prevents a worker swap from changing the data-handling profile.
- Observability and feedback. Captures cost-per-request, quality scoring, and routing decisions, and feeds them back into the routing layer so the rule set or the learned router improves over time.

The first thing this picture clarifies is that no single component needs to be world-class. The routing layer can start as a rules engine with a few dozen patterns and improve toward a learned router as data accumulates. The worker pool can start with two or three models and grow. The policy layer is the same one your existing IT controls already approve. The architecture's value is its shape, not the cleverness of any individual node.
The second thing this picture clarifies — and it is the part most mid-market RFPs miss — is that the architecture is what you should be buying, not the model. A vendor whose proposal hardcodes a frontier model into application logic is selling you 2024's stack. A vendor whose proposal ships a routing layer, a worker pool, and a policy plane is selling you the structure that captures every subsequent price-curve step the industry takes. We covered the procurement side of this conversation in detail in the AI Employee pricing guide — the right read in 2026 is to evaluate vendors on architecture portability first and on individual model selection second.

How Big Is the Cost Cut for a Northeast Indiana Mid-Market Firm?
The cost story for a typical Cloud Radix mid-market client deserves a side-by-side. The composite below describes the shape we see across DeKalb County, Allen County, Whitley County, and Noble County 50–250-employee firms — professional services, healthcare admin, manufacturing operations, and financial services — running three to five AI Employees per business across customer-facing and operational workloads. The numbers are list-price token math at May 2026 rates and are illustrative rather than specific to any one client.
| Architecture | Annual token cost (illustrative) | Capability profile | Vendor risk |
|---|---|---|---|
| Single-vendor frontier subscription (per seat) | ~$200K | High — everything routes to one frontier model | High — one provider can change pricing, ToS, or availability |
| Frontier-only + manual model selection per workload | ~$120K | High but uneven — some workloads overpay | Medium — partial portability |
| Multi-model with rules-based router | ~$45K | High and balanced — workloads matched to model | Low — clean swap path |
| Multi-model with learned router (Conductor-class) | ~$22K | Highest — workflows constructed per-query | Low — routing intelligence is local |
Two reads worth pulling out of the table for a mid-market owner. First, the absolute number on the top row is what blocked AI Employee adoption for many of the businesses we have spoken with in NE Indiana over the last twelve months. A six-figure annual AI line item is hard to defend at 50-to-250-employee scale even when the ROI math is favorable, because the variance dominates the budget conversation. The bottom rows reduce the number to something a CFO writes off without a board memo, which changes the procurement conversation from whether to how soon.
Second, the cost gap between rules-based routing and learned routing is meaningful but smaller than the gap between single-vendor and any router. The first 60–80% of the cost cut is captured by the simple architectural decision to put a routing layer in. Conductor-class learned routing — what Sakana built — gets the last 50% of the savings on top of that. For a mid-market business, the first move (rules-based router) is the higher-priority one and is available today on tooling we already build into the Secure AI Gateway.
The qualitative tradeoff worth naming is governance overhead. Multi-model routing increases the number of vendor relationships you maintain, the number of data-handling profiles you document in your AI Bill of Materials, and the number of failure modes you need to test for. The savings are large enough to fund that work several times over, but the work is real. The broader NIST AI Risk Management Framework framing applies here — alongside ISO/IEC 42001 as the documented management-system standard for AI — and the OWASP LLM Top 10 flags routing-specific risks (prompt-injection routing manipulation, supply-chain compromise of a worker model) that a mid-market deployment must include in its threat model.

Local Angle: What Changes for a 50-to-250-Person Firm in Auburn, Fort Wayne, and Allen County?
Northeast Indiana's mid-market business community looks different from the enterprise audience most multi-model AI literature is written for. The typical Cloud Radix prospective client across Auburn, Fort Wayne, DeKalb County, Allen County, Whitley County, and Noble County has revenue in the $5M–$75M range, employs 50 to 250 people, and runs an IT function of one to three people who already handle network, endpoint, security, and SaaS administration. Adding “manage seven AI vendor relationships” to that team's workload is not realistic. The Sakana finding is good news for that team specifically.
The reason: a router-fronted architecture consolidates the operational surface area even as the model count expands. The IT manager at a 120-person Auburn manufacturer or a 60-person Fort Wayne CPA practice does not interact with seven vendor relationships day to day. They interact with one routing-layer dashboard, one Secure AI Gateway policy, and one observability stream. The seven worker models live behind the gateway. When pricing shifts on one provider, the routing layer adjusts a configuration field. When a new open-weight model lands — which on the 2026 release cadence happens roughly every six weeks — adding it to the pool is a worker registration, not a procurement cycle.
That operational profile is also what makes the Sakana-class router approach defensible at NE Indiana scale. The mid-market firm does not need to train its own RL Conductor. It needs to buy or build a Secure AI Gateway that has the routing socket — the place a Conductor-class router plugs in when the open-source community ships one that meets the firm's policy bar. Sakana's commercial product, Fugu, is in beta with low-latency (Fugu Mini) and high-performance (Fugu Ultra) variants; that is one path. ModelRelay-class internal routing is another. The pattern matters more than the specific implementation.
For Fort Wayne and Auburn business owners weighing AI Employee proposals right now, the practical advice is direct: do not buy a single-vendor proposal in 2026. The Sakana paper is the technical confirmation that a router-fronted architecture beats single-vendor on capability and cost simultaneously, and the open-source momentum behind that pattern is large enough that the routing layer will keep getting better whether you build it or buy it. The procurement question is whether your AI Employee architecture has the socket for it. Anything else is a 2024-vintage decision dressed up in 2026 marketing. The AI Employee ROI calculator gives you the order-of-magnitude estimate while you wait.
Cloud Radix's multi-model procurement diagnostic is a fixed-fee, one-week engagement: we audit your existing AI Employee deployment (or any proposal you have in hand from another vendor), identify the single-vendor lock-in points, and hand you a written architecture memo describing the routing layer, the worker pool, and the per-workload routing rules that match your business profile. If you want to validate the savings before committing to the architecture, we run the cost projection against your actual workload mix at current vendor pricing and at projected next-quarter pricing. Book the diagnostic — first response within one business day.
Frequently Asked Questions
Q1.Does a mid-market business actually need GPT-5.5 or Opus 4.7 for any workload?
Most do not need it as a default. Routing each query to the right model produces higher composite quality at lower cost than routing every query to the most capable single model. There are workloads — the hardest legal-research queries, complex multi-step engineering reasoning, certain healthcare analytical tasks — where a frontier-tier model materially improves the answer. A multi-model architecture does not exclude those models; it routes to them when the workload warrants and routes to cheaper workers everywhere else.
Q2.How much of Sakana's result is the small-model intelligence versus the multi-model pool?
Both contribute. The 7B Conductor's intelligence determines the workflow design. The diversity of the worker pool determines how much capability is available for the Conductor to draw on. A small router pointed at a single worker is just an expensive single-model setup. A diverse pool with no routing intelligence is the existing Mixture-of-Agents baseline that the Conductor outperformed. The combination is what produces the result.
Q3.Is Sakana's RL Conductor available to use directly in a business deployment?
The 7B research model is an exploratory blueprint and is not publicly available. Sakana has productized the framework into a commercial product called Fugu, currently in beta, with Fugu Mini and Fugu Ultra variants accessible through an OpenAI-compatible API. For mid-market businesses, the practical near-term path is more often to use a Cloud Radix-class Secure AI Gateway with rules-based routing today.
Q4.What governance risks does multi-model routing add that single-vendor does not have?
Three matter most. First, expanded data-handling surface — each worker model has its own data-residency, retention, and training-on-customer-data profile. Second, prompt-injection routing manipulation — a prompt-injection attack can target the routing decision itself. Third, supply-chain risk on open-weight workers. All three are manageable with standard NIST AI RMF and ISO/IEC 42001 controls.
Q5.How does this fit alongside existing AI tools like Microsoft Copilot, Salesforce Einstein, or Google Workspace?
Multi-model routing complements rather than replaces those tools for most mid-market deployments. Copilot, Einstein, and Google Workspace AI features are integrated into specific application surfaces. Where router-fronted AI Employees fit is the workloads those tools do not handle: cross-application orchestration, customer-facing voice, document automation across multiple systems, after-hours operational coverage.
Q6.How long does a multi-model migration take for a Fort Wayne or Auburn mid-market deployment?
For most NE Indiana mid-market deployments we have run, a clean migration sits in the four-to-eight-week range. The savings typically pay back the migration cost inside the first year of operation, often inside the first six months on heavy-volume workloads.
Q7.Is this the same as what ModelRelay does, or is the Sakana approach categorically different?
There is meaningful overlap. ModelRelay-class systems route per-request using rules and embedding-based classification. Sakana's RL Conductor is a learned router that constructs workflows — multi-step plans across the worker pool. The most likely 2026 path is rules-based routing plus occasional learned-router upgrades on specific workload classes, with the learned router becoming default as open-source Conductor-class options mature.
Sources & Further Reading
- VentureBeat: venturebeat.com/orchestration/how-sakana-trained-a-7b-model-to-orchestrate-gpt-5-claude-sonnet-4-and-gemini-2-5-pro — Ben Dickson's May 7, 2026 piece on the RL Conductor benchmark results and Fugu commercial product.
- Stanford Institute for Human-Centered AI: hai.stanford.edu/ai-index/2026-ai-index-report — 2026 AI Index Report; price-to-capability curves at the industry level.
- Artificial Analysis: artificialanalysis.ai — Independent model benchmarks and per-token pricing across the worker pool.
- National Institute of Standards and Technology: nist.gov/itl/ai-risk-management-framework — AI Risk Management Framework; baseline for multi-model governance overhead.
- OWASP: genai.owasp.org/llm-top-10/ — OWASP Top 10 for LLM Applications; routing-specific risks including prompt-injection routing manipulation.
- International Organization for Standardization: iso.org/standard/81230.html — ISO/IEC 42001 AI Management System; AI Bill of Materials and policy-layer documentation.
Reprice Your AI Employee Proposals on a Multi-Model Architecture
Fixed-fee, one-week diagnostic: an architecture memo with routing layer, worker pool, per-workload routing rules, and cost projection against your actual workload mix.



