

“Set your heart upon your work, but never on its reward.”
Bhagavad Gita

“Set your heart upon your work, but never on its reward.”
Bhagavad Gita
The conversation around AI agents has shifted dramatically. A year ago, "agent" meant a chatbot that could call a couple of tools in a loop. Today, it refers to systems that orchestrate multi-step workflows, manage state across sessions, and handle failure gracefully — all running in production at scale.
Having spent the past year building agent platforms at Adya, I've watched this transition unfold firsthand. Here's what actually changed, and what most people still get wrong.
The biggest blocker for production agents was never intelligence — it was reliability. Early agent loops would hallucinate tool calls, get stuck in infinite retries, or silently produce wrong results. The models were capable enough, but the surrounding infrastructure wasn't.
What changed:
The result: agent failure rates dropped from "hope it works" to measurable single-digit percentages that you can actually debug.
Not every agent pattern from 2024 made it. The ones that stuck share a common trait: they embrace constraints rather than fighting them.
The dream of a fully autonomous agent that figures out everything on its own is still mostly a dream. What works in production are workflow agents — systems where the high-level plan is defined by the developer, but individual steps are handled by LLMs.
const workflow = createWorkflow({
steps: [
{ name: "classify", agent: classifierAgent },
{ name: "extract", agent: extractionAgent },
{ name: "validate", agent: validationAgent },
{ name: "act", agent: actionAgent, conditional: true },
],
onStepFailure: "retry-with-fallback",
});This isn't less powerful than autonomous agents — it's more reliable. The LLM handles what it's good at (understanding context, extracting information, making judgment calls), while the developer handles what they're good at (defining process, enforcing constraints, handling edge cases).
Every production agent system I've seen includes a human approval step for high-stakes actions. The pattern that works:
This isn't a limitation — it's a feature. The agents that try to do everything autonomously are the ones that cause incidents.
A production agent system in 2026 typically includes:
The tooling for all of this has matured significantly. You don't need to build most of it from scratch anymore.
The engineering challenges that remain aren't about making agents smarter — they're about making them manageable:
Evaluation is still the bottleneck. Building an agent is fast. Knowing whether it works well is slow. Creating good evaluation datasets requires domain expertise, and there's no shortcut.
Cost management matters. A poorly designed agent can burn through API credits fast. The agents that work in production are the ones with tight token budgets, aggressive caching, and smart model routing.
Debugging multi-step failures. When an agent makes a wrong decision on step 3 of 7, tracing back to the root cause still requires careful analysis. Better tooling helps, but it's inherently complex.
The shift from "AI agents as demos" to "AI agents as infrastructure" is real. But like every infrastructure transition, the exciting part isn't the technology — it's the engineering discipline required to make it work reliably.

Developers spent decades wishing for tools that write code. Now they have them. Why does freedom feel like loss?

Shadow IT on steroids, MCP tools nobody asked for, LLMs playing architect, vibe-coded open source, and text-to-SQL fantasies. The antipatterns everyone's falling into — and how to stop.

The era of prompting your way through codebases is hitting a wall. Comprehension debt — the hidden cost of code you ship but don't understand — is the new technical debt.