There is a number that finally settles a year-long argument about what an “AI Employee” actually is. It is 26 minutes.
That is how long an AI agent works autonomously in a single user session — retrieving data, writing code, clicking through a browser, calling connected tools, and producing finished output — according to a new study from Harvard and Perplexity. The comparison number is what makes it land: a traditional search interaction lasts about 33 seconds. Same underlying tasks, two completely different machines. One you query for a few seconds and then go do the work yourself. The other you assign the work to and walk away.
That contrast is the cleanest way we have ever found to explain the difference between a tool you use and a worker you delegate to. A search box — and, frankly, a basic chatbot — is a query-and-retrieve device: you ask, it answers, and the multi-step work still lands on your desk. An AI Employee is an assign-and-walk-away device: you hand it an objective and it sustains the work for minutes, not seconds. For a mid-market business owner trying to decide whether “AI” is a productivity gimmick or a genuine staffing decision, that gap — call it the delegation gap — is the whole ballgame.
We have written a lot about which KPIs to track and how many dollars an agent returns. This post is about something more fundamental and, until now, harder to prove: sustained autonomous work duration is itself the buying signal. If a system only works for seconds, it is a better search box. If it works for half an hour at a stretch, it is a coworker. Here is what the research shows, what it does not, and how a Northeast Indiana company should read it before spending a dollar.
Key Takeaways
- A Harvard–Perplexity study found AI agents perform about 26 minutes of autonomous work per session versus 33 seconds for traditional search — roughly a 47x difference in sustained task engagement.
- The finding came from matched-pair analysis of around 10,000 near-identical query pairs over a 90-day window, holding the underlying task constant across a search product and an agent product.
- Time-on-task — not feature lists — is the cleanest signal that separates a search box or chatbot (query-and-retrieve) from an AI Employee (assign-and-walk-away).
- The same study reported agents cut matched-task completion time from 269 minutes to 36 minutes and shifted work toward higher-order, cross-domain tasks — but agents carry a higher fixed cost per task and a real review burden.
- Longer autonomous sessions are only an asset if they are safe to grant: scoped credentials, egress control, and a clear completion checkpoint keep “autonomous” from sliding into “unsupervised.”
- For a mid-market NE Indiana firm, the practical read is to delegate one repeatable 26-minute workflow first, measure the reclaimed hours, then expand the agent's authority only on steps it has earned.
What did the Harvard–Perplexity study actually measure?
The headline is easy to misread, so start with the method. As reported by MarkTechPost and laid out in the underlying paper, How AI Agents Reshape Knowledge Work, researchers Jeremy Yang (Harvard) and Kate Zyskowski, Noah Yonack, and Jerry Ma (Perplexity) used production data from two real products — Perplexity's Search and its agentic “Computer” — over a 90-day window from late February to late May 2026.
The clever part is the matched-pair design. The team found roughly 10,000 session pairs where a user kicked off near-identical tasks (cosine similarity above 0.99) on both products. That holds the task constant, so you are comparing how the two systems handle the same intent rather than comparing easy questions against hard ones. On those matched tasks, the agent product performed about 26 minutes of autonomous work per session against 33 seconds for search. The medians tell the same story at a smaller scale: about 9 minutes versus 14 seconds.
Two methodological caveats deserve stating up front, because honesty about limitations is the only way these numbers stay useful. First, this is production data from one vendor's two products — an existence proof of what agentic systems can sustain, not a universal benchmark for every tool on the market. Second, the agent product launched only days before the data window opened, so adoption curves are early and quality was inferred from next-turn dissatisfaction rather than direct user ratings. The study itself notes a 55% lower per-query dissatisfaction rate on the agent, which is encouraging, but inferred satisfaction is not the same as a controlled quality audit. Read the 47x as a directional truth about category difference, not a spec sheet.

What is the delegation gap — and why does it change the buying decision?
For three years, most businesses have evaluated AI the way they evaluate a search engine or a help-desk widget: How good is the answer? How fast does it come back? That framing quietly assumes a human will take the answer and do the actual work. Thirty-three seconds is the signature of that world — long enough to retrieve, too short to execute.
The delegation gap is what opens up when a system can hold an objective for minutes instead of seconds. In those 26 minutes the agent is decomposing the task, gathering its own inputs, making context-dependent decisions, recovering from steps that do not work, and chaining tools together — the orchestration a person used to do by hand between search queries. This is the quantified version of an argument we have been making about proactive AI agents and the end of the chatbot era: the 33-second search interaction is the chatbot era, measured. The agent's half-hour session is what replaces it.
This is also why feature checklists mislead buyers. Two products can both claim “AI-powered research,” “tool use,” and “automation,” and one of them collapses the moment a step fails while the other grinds through a 26-minute workflow to a finished result. The capability you are actually buying is sustained autonomous work — the ability to stay on a multi-step task without a human shepherding every turn. Sustained work duration is the difference, and it is the thing to probe in any vendor demo: not “can it answer this,” but “how far can it carry this before it needs me.”
That reframing matters most for mid-market teams, because they are the ones who cannot afford to babysit software. A 200-person enterprise can staff a team to supervise a flaky tool. A 15-person firm cannot. For them, time-on-task is not a vanity metric — it is the line between a worker that earns its seat and a subscription that quietly becomes shelfware.

How should you map work to the delegation gap? A Work-Session Value Matrix
The practical question is not “is the agent impressive” but “which of my workflows actually benefit from 26 minutes of autonomy.” Not all of them do. A quick lookup is still better served by a search box; the agent's overhead only pays off when the task has real multi-step depth. The matrix below is how we scope candidate workflows with clients — comparing what a person spends, what search shaves off, what an AI Employee can absorb autonomously, and, crucially, where a human checkpoint belongs.
| Task type | Time a human spends | What search saves | What an AI Employee absorbs autonomously | Where the human checkpoint belongs |
|---|---|---|---|---|
| Quick fact lookup | 1–2 min | Most of it (seconds) | Little — overhead outweighs benefit | None needed; not an agent task |
| New-client intake & research brief | 45–90 min | A few minutes of searching | A full ~26-min research-and-draft session | Review brief before it reaches the client |
| Competitive / market scan | 2–4 hrs | Scattered minutes | Multi-step gathering, synthesis, draft summary | Verify sources and framing |
| Lead qualification & follow-up draft | 30–60 min | Minimal | Enrichment, scoring, drafted outreach | Approve any message before it sends |
| Recurring report assembly | 1–3 hrs | A little | Pull data, reconcile, assemble, format | Sign off on figures before distribution |
| Multi-system data reconciliation | 2–5 hrs | Almost none | Cross-tool retrieval and matching | Approve any write-back that changes records |
The pattern is the same one the study found at scale: agents earn their keep on composite, multi-step tasks, not one-shot questions. They are a poor trade for a 90-second lookup and an excellent trade for the 90-minute slog. When you are deciding what to delegate, that right-hand column is the discipline — every row keeps a person on the judgment and on anything irreversible.
Notice that this matrix measures minutes absorbed, not quality or dollars. Those are the next two questions, and they have their own tools. For the quality side, see the AI Employee KPIs that actually matter; for the money side, you can turn those minutes into dollars with a monthly value audit. Time-on-task tells you whether a workflow is delegable at all. The other two tell you whether it is worth delegating.

Does longer autonomous work actually mean better work?
A fair skeptic's response: a machine that “works” for 26 minutes could just be 26 minutes of busy-looking noise. The study tried to answer that directly, and the scope findings are the more interesting half of the paper.
On matched tasks, the agent reduced completion time from 269 minutes to 36 minutes — an estimated 87% drop in time and 94% in cost compared with a human armed only with search. It also changed the kind of work people attempted. Agent queries more often crossed occupational boundaries (59% versus 50% for search), pulled from more knowledge domains per query (2.40 versus 1.74), and skewed toward higher-order cognition — 76% of agent queries reached the upper, “create and evaluate” tiers of Bloom's taxonomy against 55% for search, with create-level work nearly doubling. The team measured task breadth against the U.S. Department of Labor's O*NET occupational task statements and found agents touched about 60% more distinct task types — and that roughly 23% of agent queries were work the same users essentially never sent to a search box at all.
In plain terms: the longer the system can work unattended, the bigger and harder the jobs people are willing to hand it. That tracks a broader trend in independent research. METR's study on AI's ability to complete long tasks found that the length of tasks frontier models can finish at a 50% success rate has been doubling roughly every seven months for six years, putting leading models near a one-hour time horizon. Two different methods — production usage data and controlled benchmarking — point the same direction: autonomous work duration is rising, and it is the axis that matters.
Now the honest counterweight. The same study shows agents carry a higher fixed cost per task — there is real delegation-and-review overhead — and the raw model spend per task runs higher than a cheap search query, even as total time-and-cost drops sharply. Longer sessions also mean more steps where an agent can drift on an unusual case. The takeaway is not “delegate everything.” It is “delegate the work that clears the breakeven — multi-step, repeatable, verifiable — and keep a human on the rest.”

If autonomy is the value, what makes a 26-minute session safe to grant?
Here is the part the productivity headlines skip. A system that works for 26 minutes without you is, by definition, a system taking actions you are not watching in real time. The value and the risk are the same property. “Autonomous” is an asset; “unsupervised” is a liability — and the only thing standing between them is governance.
The market data says most companies have not crossed that line safely yet. McKinsey's analysis of the agentic AI advantage found that while nearly two-thirds of organizations are experimenting with AI agents, fewer than 10% have scaled them into reliable, value-producing production use. Its 2025 State of AI survey tells the same story from the results side: a majority are experimenting, but only a minority can yet attribute meaningful financial impact. The bottleneck is rarely the model's raw ability. It is trust, data discipline, and the absence of guardrails — the exact problem we mapped in the 85/5 AI agent trust gap.
The way you make a long session safe to grant is to put a control layer between the agent and everything it can touch. That is what our Secure AI Gateway is for, and it does three jobs that map directly onto the risks of autonomy:
- Scoped credentials. The agent gets the narrowest possible access for the task — read-only where reading is enough, a single connector instead of the whole stack — so a 26-minute session can only ever act inside a fenced lane.
- Egress control. What data the agent can send, and where, is constrained and logged. Autonomy never becomes an open door to exfiltrate customer records or paste sensitive data into an unapproved destination.
- A completion / checkpoint boundary. The session ends at a defined hand-off — a drafted output, a queued action — where a human approves anything irreversible before it ships. The agent does the 26 minutes of work; a person signs off on the 30 seconds that matter.
That is the design that lets a small business enjoy the upside the study measured without taking on the downside the headlines ignore. Sustained autonomy and tight oversight are not opposites; the gateway is what holds them together.

What does 26 minutes of autonomous work look like in Northeast Indiana?
Translate the abstraction onto a real Allen County or DeKalb County business and it stops being a research curiosity. Picture a 15-person professional-services firm in Fort Wayne — a small accounting practice, a title company, a regional insurance agency. Every new client triggers the same grind: pull together background, check public records, reconcile a few systems, and draft an intake brief. Today a staffer loses the better part of an afternoon to it, mostly in 33-second search bursts strung together by hand.
That is a textbook 26-minute delegation. An AI Employee runs the research-and-draft session per new client — gathering the inputs, reconciling the records, producing a structured brief — while your people spend their afternoon on the work clients actually pay for: judgment, relationships, and the conversations a machine cannot have. The firm is not hiring its way out of the busywork (a hard ask in this labor market); it is routing the busywork to a worker that runs nights and weekends and escalates the genuinely hard cases to a human.
The economics are sharpest precisely for mid-market firms in regions like ours, where you cannot easily add headcount and every reclaimed hour counts double. The move is not a big-bang rollout. It is to pick one painful, repeatable workflow, prove the hours it returns, and expand from there — the same disciplined approach we lay out in our practical AI adoption playbook for NE Indiana owners. Local, measurable, reversible — that is how a Fort Wayne business turns a Harvard statistic into payroll it does not have to spend.
Put the delegation gap to work
The Harvard–Perplexity study did not announce a new feature. It measured a category line: 33 seconds is a tool, 26 minutes is a coworker. For a mid-market team, the strategy that follows is simple — find the multi-step workflows where time-on-task is real, delegate one, govern it tightly, and measure what comes back.
Cloud Radix builds and deploys AI Employees for Northeast Indiana businesses, with our Secure AI Gateway — scoped credentials, egress control, and human checkpoints — wrapped around them from day one, so a long autonomous session is something you can actually grant. If you want to map which of your workflows clear the delegation breakeven and would benefit from an AI Employee that works for minutes instead of seconds, talk to Cloud Radix. We will start with one workflow, prove the hours, and build from there.
Frequently Asked Questions
Q1.What did the Harvard–Perplexity study find about AI agents versus search?
Using production data from Perplexity's Search and agentic Computer products, researchers from Harvard and Perplexity found that on matched, near-identical tasks an AI agent performed about 26 minutes of autonomous work per session versus 33 seconds for a traditional search interaction — roughly a 47x difference. The study also reported agents cut matched-task completion time from 269 minutes to 36 minutes and shifted work toward higher-order, cross-domain tasks.
Q2.Why does “time on task” matter more than features when buying AI?
Sustained autonomous work duration is the clearest signal of whether a system is a search box you query or a worker you delegate to. Two products can list identical features, yet one stalls after a single step while the other carries a multi-step workflow for half an hour. For mid-market teams that cannot afford to supervise flaky software, how far a system can carry a task before it needs a human is the difference between a coworker and shelfware.
Q3.Does a longer autonomous session mean the AI does better work?
The study suggests yes for the right tasks: agents took on broader, more cognitively demanding, cross-domain work and cut completion time and cost sharply on matched tasks. But there are real trade-offs — agents carry a higher fixed cost per task, more raw model spend, and more steps where they can drift on unusual cases. The right move is to delegate multi-step, repeatable, verifiable work and keep a human on the rest.
Q4.How do you keep a 26-minute autonomous AI session from becoming a security risk?
You put a control layer between the agent and your systems. A Secure AI Gateway grants scoped credentials (the narrowest access needed), enforces egress control (limiting what data can leave and where), and ends each session at a checkpoint where a human approves anything irreversible. That lets you capture the productivity of autonomy without letting “autonomous” become “unsupervised.”
Q5.Is an AI Employee a good fit for a small business in Fort Wayne or Northeast Indiana?
It can be, especially for repeatable multi-step workflows like client intake, research briefs, lead follow-up, or report assembly. The practical approach is to delegate one painful workflow first, measure the staff hours it returns, govern it with human checkpoints, and expand only on steps it has proven. In a tight regional labor market, reclaiming hours from an AI Employee is one of the few levers a local firm fully controls.
Q6.How is an AI Employee different from ChatGPT or a chatbot?
A chatbot or search box is a query-and-retrieve tool: you ask, it answers in seconds, and the multi-step work stays on your desk. An AI Employee is built to take an objective and sustain autonomous work on it — decomposing the task, gathering inputs, recovering from failed steps, and chaining tools — for minutes at a stretch, then handing finished output to a human for approval. The Harvard–Perplexity 26-minutes-versus-33-seconds gap is that difference, measured.
Sources & Further Reading
- MarkTechPost: marktechpost.com — A New Study from Harvard and Perplexity Finds AI Agents Perform 26 Minutes of Autonomous Work Per Session vs 33 Seconds for Search
- arXiv (Harvard / Perplexity): arxiv.org/abs/2606.07489 — How AI Agents Reshape Knowledge Work: Autonomy, Efficiency, and Scope
- METR: metr.org — Measuring AI Ability to Complete Long Tasks
- McKinsey & Company: mckinsey.com — Seizing the agentic AI advantage
- McKinsey & Company: mckinsey.com — The state of AI in 2025: Agents, innovation, and transformation
- U.S. Department of Labor: onetonline.org — O*NET OnLine occupational task statements
Find the Workflow Worth 26 Minutes of Autonomy
We will map which of your workflows clear the delegation breakeven, deploy an AI Employee on one of them, and prove the reclaimed hours — with a Secure AI Gateway wrapped around it from day one.



