Pillar guide

Self-improving agents

Self-improving agents update their harness — prompts, tools, workflows, memory, evals — from production signal, in a latent space outside the model weights and portable across any model.

In one paragraph

The shortest accurate definition

A self-improving agent is an AI system that uses signal from its own production behavior to update its harness — prompts, tools, workflows, context, memory, evals — in a latent space outside the model weights, so improvements apply to any model and adapt to each user.

The phrase self-improving agent gets used for two very different things: a runtime loop where one agent retries itself on a single task (AutoGPT-style), and an operational loop where an agent improves across runs by learning from production. This guide is about the second kind — the version that actually compounds in shipped products. The thesis is concrete: most agent improvement happens on the harness layer, not the weights. The bottleneck is not the update step. It is the signal step, and the question of where the resulting learnings live.

Updated

Pattern

Improvement = signal + harness update (model usually unchanged)

Every self-improving agent is a pipeline of two stages — signal generation, then a harness update. Most teams over-invest in changing the model and under-invest in the signal. The model rarely needs to move.

  • Signal: production conversations turned into intent clusters, behavioral failure exemplars, and frustration trajectories with agent counterfactuals attributed to a layer of the harness.
  • Update: prompt edits, tool changes, retrieval index updates, workflow rewiring, memory state changes, eval-set refreshes. Each lives outside the model weights and applies to whichever model the harness mounts.

Pattern

Why outside the model weights

The case for keeping learnings outside the model is practical, not philosophical. Five concrete properties:

  • Portable across models: swap GPT for Claude for an open-weight model, and the harness-layer learnings keep applying.
  • Per-user adaptive: the harness can carry per-user learnings without retraining a shared model for each user.
  • Inspectable: a human can read prompts, tool schemas, workflow steps, retrieval contents, and memory state. Weight diffs are opaque.
  • Reversible: a regression rolls back as a harness state change. Promoting and demoting model checkpoints is heavier.
  • Continuous: ship in minutes. Fine-tuning cycles ship in days to quarters.

Pattern

Five production patterns that work

These five patterns recur across teams that ship measurable improvements quarter over quarter without retraining their model.

  • Nightly prompt updates from clustered failure exemplars: the cheapest, fastest loop and where most teams start.
  • Tool schema tightening and routing changes from detected tool call failures and schema drift.
  • Retrieval index expansion from intent clusters with poor coverage — surfaces what the corpus is missing.
  • Workflow restructuring from agent path analysis that surfaces loops or premature handoffs.
  • Per-user harness state: learnings about a specific user carried as harness context, not encoded in shared model weights.

Pattern

What goes wrong

Self-improving agents are easy to start and hard to keep honest. These are the failure patterns to guard against from day one.

  • Reaching for fine-tuning first when a harness edit would have worked.
  • Optimizing for the wrong signal: chasing thumbs-up rates or session length while real task completion regresses.
  • Closed-loop selection bias: only learning from users who complain, ignoring the silent majority that quietly abandons.
  • Operational catastrophic forgetting: a prompt edit fixes cluster A and regresses cluster B because the eval set does not cover B.
  • Update velocity outrunning measurement: shipping harness changes faster than you can measure their impact.

Pattern

A minimal harness-layer self-improvement loop

If a team is starting from zero, this is the shape of the loop that has the highest hit rate — and it never touches the model weights.

  • Instrument conversations and ingest them into a production-analytics layer.
  • Cluster conversations into a 3-level intent taxonomy; refresh weekly; watch for emergent intents.
  • Detect behavioral failures and frustration trajectories; route each to a specific layer of the harness.
  • Ship a single prompt, tool, workflow, or retrieval edit informed by the top exemplar; tag the change.
  • Watch intent-cluster failure rate and frustration share for the following week; revert the harness state if anything material regresses.

Frequently asked

Questions

Is a self-improving agent the same as AutoGPT?

No. AutoGPT loops a single agent to complete a single task. A self-improving agent uses production outcomes from many tasks to update its own future behavior. The unit of improvement is the agent across runs, not the run itself.

Do self-improving agents require fine-tuning?

Not for most teams. Prompt edits, tool wiring, retrieval changes, workflow rewiring, and per-user harness state produce the majority of measurable improvement. Fine-tuning is reserved for stable patterns where harness edits have been exhausted, and even then it should be evaluated against the alternative of a sharper harness.

Why keep memory and learnings outside the model weights?

Portability, inspectability, reversibility, per-user adaptation, and update velocity. Learnings baked into weights are model-specific, shared across all users, opaque, hard to roll back, and slow to ship. Learnings carried in the harness are none of those things.

What's the biggest risk?

Optimizing for the wrong signal. Most failed self-improvement loops chase thumbs-up rates or session length and silently regress real task completion. Production analytics that surface behavioral failures and frustration root causes, attributed to a layer of the harness, are the guardrail.

How does Moda fit into a self-improvement loop?

Moda is self-improvement for AI agents on the harness layer. We turn production conversations into intent clusters, emergent intents, behavioral failures, and frustration trajectories with agent counterfactuals — each routed to a specific layer of the harness (prompt, tool, workflow, context, memory, eval, model). The improvements live in a latent space outside the model weights and apply across any model.

See self-improving agents on your traffic.

Moda turns production conversations into the production signal these loops need: intent clusters, behavioral failure exemplars, frustration root causes.