A customer dials your Fort Wayne service business at 7:40 p.m. They speak Spanish. Your front desk closed at five, the after-hours line rolls to voicemail, and the greeting is in English. They hang up and call the next company on the list. You never knew the call happened — and in a market where word-of-mouth inside tight-knit communities decides who gets the next ten jobs, you didn't just lose one customer. You lost their cousin, their neighbor, and their church group too.
For years, the technology answer to that problem was clumsy: transcribe the call, translate the transcript, and read it later. Useful for records, useless for a live conversation. That gap is now closing. A new class of real-time speech-translation models has crossed the accuracy-and-latency threshold where an AI Employee can actually hold a bilingual phone conversation — listen in one language and respond in another with conversational lag, not awkward dead air. This post breaks down what changed, what's genuinely production-ready today (and what isn't yet), and how a Northeast Indiana service business should think about deploying it without putting customer data at risk.
Key Takeaways
- Real-time speech-to-speech translation models now run a Spanish↔English call in roughly three seconds of round-trip lag — close enough for a live conversation, not just an after-the-fact transcript.
- The technical shift is collapsing transcribe-then-translate into a single pass, which is what removed the lag that used to make live translation unusable.
- Spanish is fully in scope of these models today; Burmese and other lower-resource languages are the direction of travel, not a finished product — be honest with customers about which is which.
- Allen County's Hispanic community is now the county's largest minority, and Fort Wayne hosts one of the nation's largest Burmese populations — the demand is local and documented.
- A multilingual AI Employee should hand staff a clean English summary of every call, not just a recording.
- Any system touching health or financial details needs a control layer — a Secure AI Gateway — to keep regulated data scoped and contained.

What actually changed in real-time AI translation?
The old approach chained three separate models together: speech-to-text, then text translation, then text-to-speech. Every hop added delay, and the delays stacked into the multi-second pauses that make a translated phone call feel broken. According to MarkTechPost's reporting on Gradium's new release, the newer models collapse that pipeline — transcription and translation happen in a single pass, dropping the cascade from three models to two.
The numbers are what make this a deployable shift rather than a demo. Gradium's s2s-translate model posts an average round-trip latency of about 3.0 seconds, compared with roughly 3.6 seconds for gpt-realtime-translate, while a comparable Gemini live-translate model lands near 2.9 seconds. On accuracy, Gradium reports leading BLEU scores against both competitors and comparable error rates on the MetricX metric. Three seconds isn't instant — a human interpreter is faster — but it's inside the range where a caller will wait for the response instead of assuming the line went dead.
There's a candid limit worth stating up front, because it changes who can use this today. Gradium's current models cover five languages: English, French, German, Spanish, and Portuguese — 20 directional pairs in total. Spanish↔English is squarely supported. Burmese is not. That matters enormously in Fort Wayne, and we'll come back to it. The honest read is that real-time Spanish service is here now, and lower-resource languages are where the architecture is heading — not a box you can check today.
A few engineering details make these models practical for a phone line rather than a lab demo. They run over a single duplex WebSocket connection — one stream in, one stream out — which is what keeps the round trip tight enough to feel conversational. They also let you choose the output voice from a catalogue and even clone a voice, so the AI Employee can sound consistent with your brand across English and Spanish rather than switching to a jarringly different synthetic voice mid-call. Small things, but they're the difference between a caller staying on the line and a caller deciding they've reached a robot and hanging up.
This capability also rides on top of a voice stack that has been getting both smarter and cheaper. We've written before about how reasoning now runs inside real-time voice, letting an agent think through a call instead of reading a script, and how the price of enterprise voice AI collapsed over the past year. Real-time translation is the next layer stacked on that foundation — language is simply the axis that was missing.
Why does a missed non-English call cost more than you think?
Treat this as a market sizing exercise, not a feel-good one. The callers exist, the demand is documented, and most local competitors are quietly losing this business.
Start with Spanish, because that's the language the technology fully serves today. Per WANE 15's reporting on 2024 Census estimates, the Hispanic community is now the largest minority group in Allen County, with roughly 38,000 residents — and it's growing faster than other groups, driven by younger age cohorts. Not every Hispanic resident is a Spanish-first caller, but a meaningful share of intake, scheduling, and billing calls in this market come from households where the conversation goes better in Spanish.
Then there's Fort Wayne's Burmese community, one of the largest in the United States. Pew Research Center reports that about 25,000 of the nation's Burmese-alone population live in Indiana — roughly 12% of the U.S. total — concentrated heavily in the Fort Wayne metro. Pew also found that only about 43% of Burmese ages five and older speak English proficiently, and just 35% among immigrants specifically. That is a large, local population for whom an English-only phone tree is a wall.
The cost of that wall is not abstract. A qualitative study published in the National Library of Medicine on immigrants with limited English proficiency found that language barriers directly impede the ability to book and attend appointments — clients struggle to follow a receptionist, miss follow-ups, and fail to renew prescriptions, sometimes resurfacing months later with uncontrolled conditions. For a healthcare front desk, a dental office, or a legal intake line, every one of those moments is a call that didn't convert and a relationship that didn't form. This isn't only a healthcare problem — home services, property management, and financial services lose the same calls — but healthcare makes the stakes vivid.

How does a multilingual AI Employee handle a live bilingual call?
Picture the workflow end to end, because the translation model is only one part of it. A multilingual AI Employee answers the after-hours line, greets the caller, and detects the language. For a Spanish-speaking caller, it listens in Spanish and responds in Spanish in real time, gathering what you'd want any receptionist to capture: name, callback number, reason for the call, urgency. It can answer common questions from your knowledge base, book an appointment slot, or escalate a genuine emergency.
The part that earns its keep is what happens after the call. The AI Employee logs the conversation and hands your staff a clean English summary — who called, what they need, what was promised — so a bilingual interpreter isn't required just to triage the morning queue. This is the same operating pattern behind cross-channel customer-service AI: meet the customer in their channel and their language, then normalize everything into one place your team actually works from.
For a small business, this is also a realistic on-ramp rather than a moonshot. As we've argued in the small-business back-office on-ramp, the highest-ROI place to start with AI is the repetitive front-office and back-office work — and an after-hours multilingual line is exactly that kind of contained, measurable first deployment. You're not replacing your team. You're catching the calls that currently hit voicemail and die.
A few deployment honesty notes, because over-promising here burns trust fast:
- Lead with Spanish. It's production-ready. Make it excellent before you advertise anything else.
- Be explicit about Burmese and other lower-resource languages. Real-time speech translation for them is still maturing. A defensible interim pattern is a transcribe-and-route flow with a human interpreter in the loop, clearly labeled as such — not a synthetic real-time claim you can't back up.
- Keep a human escalation path. A three-second translation lag is fine for scheduling; it is not a substitute for a trained interpreter in a clinical or legal emergency, and you should design the handoff deliberately.
What about PHI and PII? The Secure AI Gateway question
The moment a translated call includes a date of birth, a diagnosis, a Social Security number, or case details, you've moved from “nice customer-service upgrade” to “regulated data flowing through a vendor's models.” That's not a reason to avoid the technology. It's a reason to put a control point in front of it.
For any healthcare-adjacent business, the HIPAA framework treats protected health information as something you can only disclose to a vendor under specific safeguards and agreements — the vendor becomes a business associate, and you remain accountable for where that data goes. A translation pipeline that ships raw call audio containing PHI to a third-party model, with no scoping or redaction, is a problem waiting to surface in an audit.
The pattern we recommend is to scrub PII and PHI before it reaches a model and to route everything through a Secure AI Gateway that controls which data fields are allowed to leave your environment, logs every exchange, and enforces retention rules. The translation can still happen; what changes is that sensitive fields are masked, scoped, and contained rather than sprayed across an external API. In our experience, this is the difference between a system a compliance officer will sign off on and one they'll shut down.
One more legal note that's easy to miss: inbound calls a customer places to you are one thing, but outbound automated or prerecorded calls are governed by the TCPA. The FCC's guidance on automated calls centers on prior express consent — if you plan to have an AI Employee place outbound calls or send texts, get and document consent first, in the customer's language. Translation does not exempt you from consent rules.

Fort Wayne and Allen County: a multilingual market hiding in plain sight
Most Northeast Indiana service businesses are competing for the same English-speaking customers with the same English-only phone systems. The multilingual segment of this market is large, growing, and underserved — and it rewards the businesses that show up in the caller's language with exactly the kind of loyalty that's hard to buy with advertising.
| Caller segment | Local scale | Real-time AI translation status |
|---|---|---|
| Spanish-first callers | Hispanic community is Allen County's largest minority, ~38,000 residents and growing | Production-ready today — Spanish↔English is fully supported at ~3s lag |
| Burmese-first callers | ~25,000 Burmese-alone in Indiana (~12% of U.S. total), concentrated in Fort Wayne; ~43% speak English proficiently | Not yet real-time; use a transcribe-and-route flow with a human interpreter, clearly disclosed |
| Other lower-resource languages | Smaller but present across the metro | Direction of travel, not a finished product — verify before you advertise it |
Consider where this lands first. A DeKalb County dental practice or a Fort Wayne healthcare front desk can stop losing Spanish-speaking patients at the scheduling step. A property management company can field maintenance calls from Spanish-first tenants at 9 p.m. without a callback the next day. A legal intake line can capture the details of a Spanish-speaking caller's situation accurately the first time, instead of through a relative pressed into interpreting. And for the Burmese community, even an honest “we have an interpreter pathway and we'll call you back within the hour” — delivered warmly and reliably — beats the silence those callers usually get.
There's a competitive timing element here too. The economics of voice AI have already fallen far enough that a multilingual line is no longer an enterprise-only luxury; the bottleneck for a small Fort Wayne business has shifted from cost to whether anyone bothers to set it up. The firms that move first in a given vertical — the first bilingual dental scheduler in their part of town, the first Spanish-capable after-hours HVAC line in DeKalb County — get to define the reputation before competitors notice the lane is open. In communities where families pass referrals by word of mouth, that early reputation compounds in a way paid advertising rarely matches.
The point isn't that AI replaces the human relationships that make these communities tight. It's that the front door — the phone line at 7:40 p.m. — finally opens in more than one language. That's a local advantage a national competitor parachuting into Fort Wayne won't have, because they won't know the market is here.

Getting started without overreaching
If this fits your business, the right first move is small and measurable: stand up a Spanish-capable after-hours line for one location, instrument it so you can count the calls it catches that used to hit voicemail, and review the English summaries with your team for a few weeks. Expand only once it's clearly working. Cloud Radix builds AI Employees for Fort Wayne and Northeast Indiana businesses, with the Secure AI Gateway baked in so regulated call data stays scoped from day one. If you want to talk through whether a multilingual line makes sense for your front desk, get in touch and we'll map it to your actual call volume — not a hypothetical one.

Frequently Asked Questions
Q1.Can an AI phone agent really translate a live call in real time?
For supported languages like Spanish, yes — current speech-to-speech models run a round trip in roughly three seconds, which is workable for a live conversation. It's not as fast as a human interpreter, and the quality depends heavily on the language pair. Spanish↔English is well supported today; many lower-resource languages are not yet at that level.
Q2.Does this work for Burmese, given Fort Wayne's large Burmese community?
Not yet at real-time, production quality with the models we'd stake a deployment on. Burmese is a lower-resource language, and the new real-time translation models that perform well on Spanish don't currently cover it. The honest approach today is a transcribe-and-route workflow with a human interpreter in the loop, clearly disclosed — not a synthetic claim of live Burmese translation.
Q3.Is it legal and compliant to use AI to handle customer calls with personal data?
It can be, with the right controls. Inbound calls a customer places to you carry fewer restrictions than outbound automated calls, which are governed by the TCPA and require documented prior express consent. For any health information, HIPAA requires that vendors handling protected data do so under proper safeguards, which is why we route call data through a Secure AI Gateway that scopes and masks sensitive fields.
Q4.Will a multilingual AI Employee replace my front-desk staff?
No — it's designed to catch the calls your team can't, especially after hours and in languages no one on staff speaks. It hands your staff a clean English summary of each call so they can follow up efficiently. The goal is to stop losing customers at the phone line, not to remove the human relationships that close the business.
Q5.How accurate is the translation, really?
On the Spanish↔English pairs these models handle well, accuracy is high enough for scheduling, intake, and routine service questions. It is not a substitute for a certified interpreter in a clinical, legal, or high-stakes financial conversation, where a mistranslation carries real consequences. Design a human escalation path for those moments rather than relying on the model alone.
Q6.What's the smallest way to try this for my Fort Wayne business?
Start with a single Spanish-capable after-hours line for one location, measure how many previously-missed calls it captures, and review the summaries with your team before expanding. A contained pilot tells you whether the multilingual demand in your specific call volume justifies a wider rollout — without a large upfront commitment.
Sources & Further Reading
- MarkTechPost: marktechpost.com/2026/06/24/gradium-launches-stt-translate-and-s2s-translate — Gradium launches stt-translate and s2s-translate real-time speech translation models, beating gpt-realtime-translate on accuracy and latency.
- Pew Research Center: pewresearch.org/race-and-ethnicity/fact-sheet/asian-americans-burmese-in-the-u-s — Burmese in the U.S. fact sheet.
- WANE 15: wane.com/top-stories/hispanic-population-now-largest-minority-in-allen-county — Hispanic population now largest minority in Allen County, 2024 estimates show.
- National Library of Medicine (PMC): pmc.ncbi.nlm.nih.gov/articles/PMC8314461 — Impacts of English language proficiency on healthcare access, use, and outcomes among immigrants: a qualitative study.
- Federal Communications Commission: fcc.gov/consumers/guides/stop-unwanted-robocalls-and-texts — Stop Unwanted Robocalls and Texts (TCPA consumer guidance).
- U.S. Department of Health & Human Services: hhs.gov/hipaa/for-professionals/privacy/laws-regulations — HIPAA for Professionals: Laws & Regulations.
Open Your Front Door in More Than One Language
We'll map a Spanish-capable after-hours line to your actual call volume — a contained, measurable first deployment with the Secure AI Gateway baked in so regulated call data stays scoped from day one.
Schedule a Free ConsultationNo contracts. No pressure. Just an honest conversation about what would help your business.



