AI Agents in Production: What Actually Works

Tony Le·Mar 5, 2025·10 min read

Everyone's building AI agents. Very few are shipping them in production with real users. After deploying agents across seven platforms — from recipe evolution to sports analytics to content automation — we've learned what actually works and what's still hype.

This isn't a tutorial on LangChain or a comparison of GPT vs Claude. This is a field report from the trenches of production AI.

The Three Archetypes That Work

After building agents for wildly different domains, we've identified three agent archetypes that consistently deliver value in production:

1. The Analyst — Takes raw data, applies domain-specific reasoning, and outputs structured insights. Examples: GammaLens computing dealer hedging levels from raw options flow. BetEdge scanning sportsbook odds to find arbitrage opportunities. These agents work because the input and output are well-defined, and the reasoning is verifiable.

2. The Creator — Generates content within defined constraints. Examples: Let It Simmer's Saffron agent creating recipe variations based on community feedback. Spreadr generating social media content calendars. These agents work when you give them a clear creative brief and a feedback loop.

3. The Operator — Automates multi-step workflows that previously required human coordination. Examples: Domara's lease generation pipeline (collect data → analyze → draft → format → send for signature). These agents work when each step is independently verifiable.

Architecture Patterns We Use

Forget the demos where someone types a prompt and an agent does magic. Production AI agents need structure.

Our standard agent architecture:

•Structured input/output: Every agent call has a typed schema for both input and output. No free-form prompts in production code.
•Chain of verification: For any agent output that affects business logic, we run a second pass to verify the output meets constraints.
•Graceful degradation: If the AI fails, the user still has a path forward. Never block a user flow on an AI call.
•Cost guardrails: Token budgets per request, per user, per day. One runaway prompt can't blow your API bill.
•Observability: Every agent call is logged with input, output, latency, token count, and model version. You can't improve what you can't measure.

The Multi-Agent Pattern: Let It Simmer Case Study

Let It Simmer is our most complex agent deployment. Three AI agents collaborate on every recipe:

•Saffron (Creative): Generates recipe variations based on community feedback, seasonal ingredients, and culinary trends.
•Basil (Analytical): Evaluates recipes for food science accuracy — temperature, timing, chemical reactions, nutrition.
•Pepper (Curatorial): Scores recipes for quality, novelty, and community fit. Decides what makes it to the front page.

The key insight: these agents don't share a conversation. Each operates independently with structured handoffs. Saffron generates → Basil validates → Pepper curates. If Basil flags an issue, it goes back to Saffron with specific feedback, not a vague 'try again.'

What Doesn't Work (Yet)

Being honest about limitations is how you build trust with users and avoid shipping unreliable features:

•Autonomous multi-step agents without checkpoints: Giving an AI agent a complex goal and letting it run 10+ steps unsupervised still produces unreliable results. Every step needs a checkpoint.
•AI replacing domain expertise entirely: AI agents amplify domain experts. They don't replace them. Our best agents work alongside human knowledge, not instead of it.
•One-size-fits-all prompts: Every domain needs custom prompt engineering. A prompt that works for recipe generation completely fails for financial analysis.
•Real-time agents at scale without caching: LLM calls are slow and expensive. Any real-time feature needs an intelligent caching layer to be viable.

Cost Reality Check

Let's talk about what nobody in AI demos talks about — cost. Running AI agents in production with real users means real API bills.

Our cost management strategy:

•Model tiering: Use the cheapest model that meets the quality bar for each task. Not every call needs Claude Opus.
•Smart caching: Cache agent outputs for similar inputs. A recipe suggestion for 'chicken dinner ideas' doesn't need a fresh API call every time.
•Batch processing: Where real-time isn't required, batch agent calls during off-peak hours at lower cost.
•Token budgets: Hard limits per request and per user prevent cost explosions.

The Path Forward

AI agents are real. They work in production. But they work best when you treat them as a new kind of infrastructure — not magic, but a powerful tool that needs the same engineering discipline as any other production system.

The teams that win will be the ones that ship AI features with the same rigor they apply to their database architecture. Structured inputs. Typed outputs. Observability. Graceful degradation. That's the boring truth about production AI — and it's exactly why it works.

The best AI features are the ones users don't even realize are AI. They just work.

AI AgentsLLMsProductionArchitecture