How I Give Claude Code Agents Real Autonomy (And Stop 3am Disasters)

How I Give Claude Code Agents Real Autonomy (And Stop 3am Disasters)

Running Claude Code agents unsupervised is only scary if you haven’t drawn the lines.

I’ve had agents push to the wrong branch, draft content I didn’t want published, and retry failed jobs in loops that burned rate limits at 2am. All of those were solvable — not by adding more rules, but by categorizing what an agent is allowed to do without asking.

Here’s the four-tier autonomy model I run today. I wrote it as a policy document (autonomy-policy-v1.md) in my workspace. My agents read it. My cron jobs operate inside it.

The Core Principle

Maximize autonomous execution. Do not grant unlimited autonomous authority.

That distinction matters. I want my agents doing as much as possible on their own — diagnosing issues, fixing broken cron jobs, updating documentation, retrying failed tasks. But I don’t want them publishing content under my name, deploying to production, or spending money without asking.

The difference between “execute autonomously” and “have unlimited authority” is the entire game. Get it wrong and you either babysit every action (defeating the purpose) or wake up to a mess.

The Four Action Classes

Class 1 — Fully Autonomous

These are actions the agent should do without asking:

  • Read, inspect, search, verify, diagnose
  • Edit local docs, prompts, skills, memory files
  • Fix cron prompts, file paths, routing mistakes
  • Retry failed jobs after applying a safe fix
  • Improve instructions after recurring mistakes
  • Run local validation steps
  • Draft content for approval

The key word is “local.” Everything in Class 1 is reversible and stays inside the workspace. If an agent fixes a broken file path in a cron job at 3am, that’s a win. No notification needed.

Class 2 — Autonomous With Notification

The agent does these on its own, then tells me if it matters:

  • Repair a broken cron job or skill
  • Change internal routing or role documents
  • Disable a broken recurring job to stop repeated damage
  • Clean up stale config or dead files
  • Create PRs through the coding agent
  • Recover from model or rate-limit fallback issues

Class 2 is where most of the real autonomous value lives. The agent handles the problem, then sends a short message: “Fixed the blog cron — file path was wrong after yesterday’s refactor.” I read it when I read it. No approval needed, but I stay informed.

Class 3 — Approval Required

The agent prepares the best recommendation, then asks:

  • Public posting under my brand
  • Sending outbound messages, emails, or DMs
  • Publishing content
  • Production deploys
  • Purchases, subscriptions, or ad spend
  • Credential rotation or account-security changes
  • Major architecture changes

This is where trust boundaries get real. My agent drafts blog posts and X threads, but it doesn’t post them. It sends me an approval packet with the content, the reasoning, and two options: Approve or Deny.

Binary approval is important. Open-ended “what should I do?” questions waste time. The agent should have an opinion and present it as a recommendation.

Class 4 — Forbidden

The agent must not do these without explicit direct instruction:

  • Bypass safety or permission systems
  • Exfiltrate private data
  • Loosen security boundaries for convenience
  • Impersonate me in sensitive contexts
  • Make irreversible external changes when uncertainty is high
  • Modify its own gateway policy

Class 4 exists because some actions aren’t just “ask first” — they’re “never, unless I specifically tell you to.” The distinction between Class 3 and Class 4 is whether the agent should even be thinking about it proactively.

The Self-Repair Loop

The autonomy model only works if agents can fix their own problems. Here’s the loop I built into the policy:

  1. Detect the failure
  2. Classify it: known fix, new local issue, auth problem, approval needed, or external provider issue
  3. Apply the safest local fix available
  4. Verify the outcome with a real check
  5. Record the incident in daily memory
  6. Update the durable rule so the same class of failure is less likely
  7. Retry if safe
  8. Escalate only if still blocked or approval is required

Step 6 is the one that matters most. A repeated mistake is a system failure. The fix isn’t “remember better next time” — it’s changing the prompt, the skill, the cron definition, or the documentation so the failure class disappears.

I call this structural learning. If my agent makes the same mistake twice, the problem isn’t the agent. The problem is that I didn’t change the structure after the first time.

Agent Boundaries In Practice

I run two agents. The main operator handles triage, diagnosis, routing, documentation, drafting, and workflow repair. A separate coding agent handles implementation — code changes, tests, builds, branches, and PRs.

The operator can modify workspace docs, prompts, memory, cron definitions, and skills autonomously. It delegates coding work to the implementation agent. It must ask before external publishing, destructive changes, or anything touching money.

The coding agent can implement code changes, run tests, use branches and worktrees, and open PRs autonomously. It escalates to the operator for business logic ambiguity, architecture changes, security-sensitive changes, and anything requiring user-facing communication.

This separation means the operator never touches code directly, and the coding agent never publishes content. Each stays in its lane.

Making Approval Requests Useful

Bad approval request: “Should I post this to X?”

Good approval request:

  • What: Draft thread about Claude Code agent autonomy
  • Why: Follows the subagents post from last week, builds the operator arc
  • Risk: Low — opinion content, no claims to verify
  • Recommendation: Post as-is
  • Options: Approve / Deny

The agent should do the thinking. I should only need to make the decision.

Why This Works Better Than Rules Lists

I tried the rules-list approach first. “Don’t post without approval. Don’t deploy without testing. Don’t spend money.” It doesn’t scale. Every new situation needs a new rule, and agents interpret edge cases differently every time.

The four-class model works because it’s a framework, not a checklist. When my agent encounters a new situation, it classifies it: Is this local and reversible (Class 1)? Does it change internal state (Class 2)? Does it affect the outside world (Class 3)? Is it a hard boundary (Class 4)?

The classification handles edge cases that no rules list could anticipate.

FAQ

Do agents actually follow autonomy policies?

Yes, if the policy is in a file they read at startup. I keep autonomy-policy-v1.md in my workspace root. Both agents load it as part of their bootstrap context. The policy is instructions, not suggestions.

How do you handle the agent making a wrong classification?

It happens. When an agent treats a Class 3 action as Class 2 (does something externally without asking), I add it to the hard rules in HEARTBEAT.md — a file that compounds failure-driven rules over time. The structural learning loop catches it.

Does this slow agents down?

The opposite. Agents that know exactly what they’re allowed to do move faster than agents that hedge on everything. Clear boundaries enable speed.


I’m documenting the full build process in my Build & Automate community.


Published using Notipo — markdown editor with one-click WordPress publishing.

Similar Posts