How Jason Wang Built CaterAI: Voice AI That Takes Restaurant Orders

From “Can You Translate This?” to a Voice AI That Takes Restaurant Orders

At a Startup Folsom session, Jason Wang shared a founder story a lot of people can relate to: a real-world problem, a real customer, and a solution that started as a hack and turned into a business. His startup, CaterAI, is building voice agents that answer restaurant calls, take orders, and handle reservations—especially when humans are too slammed to pick up.

It began with a friend—Tony—who runs a Chinese restaurant in Davis. Tony would call Jason constantly: “Can you help me translate? The customer is confused about our dishes.” Jason helped at first, but after weeks of being pulled into the middle of customer calls, the awkward truth hit:

Jason wasn’t an employee… but he’d become part of the restaurant’s workflow anyway.

That’s when he asked the question that launched CaterAI:

What if the phone could talk to ChatGPT—without me in the middle?

The problem: restaurants miss calls when it matters most

Jason’s insight wasn’t “AI is cool.” It was “busy restaurants miss money.”

In his research, he found:

  • ~50% of calls get missed during rush hour

  • ~60% of calls are orders + reservations

  • That can mean $30K–$200K in lost revenue per location per year

  • Across ~800,000 restaurants in the U.S., that becomes a massive pool of dropped revenue (he framed it as a $25B opportunity)

Even if the exact numbers vary by restaurant, the lived experience is universal: call a restaurant at 6pm and you’re basically rolling dice.

Challenge #1: “I can’t code… now what?”

Jason Wang was direct: he’s not technical (and at the time didn’t even “vibe code”). But he did have two unfair advantages:

  1. A real customer (Tony)

  2. A technical connector (Henry—someone he’d worked with on POS/kiosk projects)

So he did the most founder thing possible: he called Henry and pitched the idea.

Henry’s first reaction: “You’re crazy.”
Which is startup code for: “This might be worth doing.”

Henry brought in two more technical builders, and by January 2025, the team started building the infrastructure behind CaterAI.

Challenge #2: latency and hallucinations (aka “voice AI is brutal”)

Text chat can tolerate a pause. Phone calls can’t.

Jason described early prototypes where the AI would “think” for 5–10 seconds. Customers would say “hello?”… which triggered the AI to think again… and suddenly you’re stuck in a loop of silence and re-processing. That’s not customer service. That’s customer eviction.

Then came the bigger risk: hallucinations.

Restaurants don’t get to shrug and say “AI is experimental” when the kitchen receives the wrong order. Voice adds even more chaos:

  • background noise

  • kids yelling random words

  • accents

  • menu names that don’t exist in “standard English”

Jason’s example landed: someone says “Hawaiian…” and a kid in the back says “cat,” and suddenly the model might interpret that as “Hawaiian barbecue cat.” Funny at a meetup. Not funny in a POS ticket.

How CaterAI made it work: multi-model + guardrails + “hot words”

CaterAI didn’t treat “the model” as magic. Jason described building a system that adapts to the restaurant:

  • some locations need Chinese support

  • some need Spanish

  • some have short menus

  • some have 400-item menus that read like poetry written by chaos

So CaterAI uses a mix of different models depending on situation, then wraps them in protective layers:

1) Menu grounding (RAG)

Instead of hoping the model “remembers” a giant menu, CaterAI uses retrieval so responses stay tied to what the restaurant actually sells.

2) Validators / “guardian” checks

Before an order is sent, another AI reviews the conversation + order for consistency. The “smartest” model may be slower, but it’s used as the final gatekeeper.

3) “Hot words” dictionaries

Voice transcription mangles food names. Jason gave a great example: “spam musubi” being transcribed as something like “salmon soupi.”

So CaterAI maps common mis-hearings to the correct menu item to prevent the agent from confidently replying, “Sorry, we don’t have salmon soup.”

The demo: a voice agent that actually places the order

Jason played a demo of a call to L&L Hawaiian Barbecue where the AI:

  • greets the customer naturally

  • confirms item choices

  • clarifies ambiguous requests (half-and-half meats)

  • checks availability + pricing

  • confirms the full order and total

  • places it and gives pickup timing

Operationally, the restaurant receives the order as unpaid, and the customer pays at pickup—keeping payment behavior aligned with how many restaurants already operate.

The results: $200/month for ~40x ROI

Jason shared early live results:

  • about $250/day in additional captured sales

  • roughly $8,000/month

  • the restaurant pays $200/month

  • estimated ROI: ~40x

Even if you cut that in half, the value prop is still obvious: capture calls you’re currently missing.

Challenge #3: growth (and the special hell of POS integrations)

Once CaterAI worked, the pain shifted to scaling.

Jason called out three growth constraints:

1) Revenue + expansion

CaterAI is live on 10 locations across several restaurant categories.

2) POS integrations (his words: “a pain in the ass”)

Integrations aren’t “build it once.” They’re forever, and legacy vendors often move slowly or keep APIs closed.

  • Square was called out as the easiest due to open APIs.

  • Others like Toast and Clover are harder.

  • One workaround: CaterAI can provide its own POS when needed.

Jason’s blunt founder advice:
If you can avoid deep integration dependencies early, do it.

3) Sales help that opens doors

Advice is cheap. A warm intro to the right operator at the right chain is priceless.

The bigger insight: success depends on how customers actually speak

Jason’s most important lesson wasn’t technical—it was human.

Customers don’t speak in exact menu names. They say things like:

  • “teriyaki chicken” (even if it’s not on the menu)

  • “that combo plate”

  • “the usual”

CaterAI’s onboarding involves learning real call behavior and building a translation layer between customer language and restaurant language. They’re also working on a “self-learning” layer that detects repeated patterns and asks the owner clarifying questions like:

“Customers keep asking for X, but it’s not on your menu—what is it?”

That’s the difference between a chatbot demo and a system a restaurant can trust.


About Jason Wang

Jason Wang is the founder of Cater AI and XIAOCUN. He currently serves as the president of Cater AI and CEO of XIAOCUN. He is an alumni of UC Davis.

Next
Next

Go-To-Market or Go-To-Bed: Stop Making Slides That Set Investor Money on Fire