Run a Claude Code Agent in Production Without Losing Your Mind

Run a Claude Code Agent in Production Without Losing Your Mind

I’ve been running a Claude Code agent named Koda as a 24/7 daemon on my MacBook Pro for about two weeks. It posts to X and Bluesky, drafts blog posts, syncs Skool members to Airtable, analyses YouTube performance, and publishes videos through a Discord approval gate. It runs under pm2, wakes up on cron, and self-heals when something breaks.

This post is the honest version of what a Claude Code agent in production looks like. Not the demo. The real thing — the failures, the operational gotchas, the boring stuff nobody writes tutorials about.

What “production” actually means for an agent

Most Claude Code tutorials stop at “I ran claude in my terminal and it wrote a script.” That’s a toy. Production for an autonomous agent means:

  • It runs without you being at the keyboard.
  • It has persistent memory across sessions.
  • It recovers from failures without manual intervention.
  • It talks to real APIs with real credentials.
  • It does measurable work on a schedule.

Koda hits all five. Right now it has 21 scheduled tasks, 11 MCP servers, 43 helper scripts, and 18 skills. Everything lives in a single flat directory: ~/.koda/.

The architecture: thin runtime, fat filesystem

The runtime is small. A pm2 process wraps the Claude Agent SDK, loads the soul file on boot, and listens for cron triggers and Discord messages. Everything interesting happens in the filesystem.

~/.koda/
├── soul.md          # identity + hard rules, loaded every session
├── user.md          # who the agent works for
├── learnings.md     # what has worked, what has failed (capped at 100 lines)
├── goals.md         # measurable targets the agent checks each loop
├── tasks.json       # 21 scheduled tasks with cron expressions
├── skills/          # 18 markdown recipes for multi-step tasks
├── scripts/         # 43 Python/Node helpers the agent calls
├── mcp-servers.json # 11 MCP servers registered
├── data/
│   ├── drafts/      # every generated artifact lands here first
│   ├── daily-logs/  # what it did each day
│   └── analytics/   # pulled metrics for context
└── .env             # API keys, never read by LLM output

The file-first design matters. When Claude’s context window fills up and the session restarts, the agent reloads state from disk on the next trigger. No vector DB, no Redis, no orchestration layer. Markdown files and cron.

What actually breaks in production

Here’s what killed uptime in the first two weeks:

1. Context exhaustion during long tasks. Publishing a blog post pulled analytics, scanned past drafts, read learnings, then tried to write 1,500 words — and hit the token limit mid-draft. Fix: split the pipeline into discrete steps that save intermediate artifacts to drafts/ so the next turn picks up where the previous one left off.

2. Silent config inheritance. Early on, Koda kept reading paths like ~/.claude/workspace/LEARNINGS.md instead of its own files. The cause wasn’t hallucination — it was settingSources: ['user', 'project'] in the Agent SDK config, which silently pulls ~/.claude/CLAUDE.md (my personal Claude Code CLI instructions) into the agent’s system prompt. The agent was dutifully following my rules instead of its own. Fix: set settingSources: [] in the SDK options and add explicit counter-rules in soul.md that override any inherited paths.

3. Sub-task turn-cap crashes. Scheduled tasks run as sub-agents with a maxTurns cap. The default is 15. Any task that reads data, analyses it, drafts content, and saves a file will burn through 15 turns and crash at “16t”. Four tasks crashed with the same error on a single morning before I noticed. Fix: set maxTurns explicitly per task — 25 is a sensible default for anything that touches the filesystem.

4. Expired cookies. Playwright scripts for scraping Skool and X broke every 2-3 weeks when session cookies rotated. Fix: a refresh-cookies skill the agent runs itself when a script exits with an auth error. The agent detects the failure, runs the cookie refresh, and retries the original task.

The self-healing loop

The most useful thing I built is a self-heal skill. When any task errors out, the agent runs a diagnostic:

1. Check pm2 status and restart count.
2. Tail the last 50 lines of the error log.
3. Classify the failure: config, auth, transient, crash loop.
4. Apply the matching fix (refresh cookies, restart, propose a code change).
5. Verify health after 30 seconds.
6. Report to Discord.

That’s a markdown file. No fancy framework. The skill file is the “code” — the agent reads it, follows it step by step, and uses its normal tools (Bash, Read, Edit) to execute each step. When I need to change the recovery logic, I edit the markdown file. No redeploy.

This is the key insight about Claude Code agents in production: behaviour lives in markdown, not code. Scripts handle deterministic work (API calls, file I/O, auth). Skills handle decision-making. The runtime is a dumb loop.

Cost reality

Running Koda costs me zero variable dollars. It runs on the Claude Max subscription I already pay for as a human Claude Code user — the same subscription that lets me run claude in my terminal for my day job. The agent uses the same quota. No per-token billing, no usage-based pricing surprises.

The only real variable cost is the occasional Gemini API call for thumbnail image generation — pennies per week, because most of my image generation uses free tools.

The caveat: Claude Max has rate limits. Mine hits every day around 10am local time and resets a few hours later. When it hits, multiple scheduled tasks can fail simultaneously until the window clears. The agent auto-retries after five minutes, which usually clears it, but it’s worth knowing if you’re running anything time-sensitive during the limit window.

In short: if you already pay for Claude Max, running a Claude Code agent costs you nothing extra. If you don’t, the subscription is cheaper than any per-token API budget you’d build up running the same workload.

Lessons from running one 24/7

  • Write skills before you write integrations. If you can’t describe the task as a numbered list in markdown, the agent won’t do it reliably.
  • Save every draft immediately. Never let an artifact live only in the chat buffer. Context will roll over and the work will vanish.
  • Approval gates for destructive or public actions. Posting to X is fine autonomously. Publishing a YouTube video always goes through a Discord approval reaction.
  • Negation lists in the system prompt. “Never do X” instructions are more effective than “always do Y”. The agent will find creative ways to violate positive rules.
  • Give it a persistent identity. A soul.md file with voice, values, and hard constraints is cheaper and more reliable than fine-tuning.
  • Cap the learnings file. I keep learnings.md at 100 lines. When it gets full, old rules get consolidated or dropped. A rulebook the agent actually reads beats a history log it skips.

FAQ

How is this different from just running Claude Code interactively?

Interactive Claude Code starts fresh every session. An autonomous agent persists state, runs on schedule, recovers from failures, and acts without you watching. Same underlying model, completely different operational shape.

Do I need the Claude Agent SDK to build this?

You don’t strictly need it — you could shell out to the claude CLI and pipe files. The SDK gives you proper session handling, tool permissions, and MCP server support out of the box. Worth it once you go past one scheduled task.

What happens when Claude is down or rate-limited?

pm2 catches the crash, restarts the process, and the next cron tick picks up where the last one left off. Scheduled tasks that hit the rate limit auto-retry after five minutes. The agent itself detects repeated failures and sends an alert to Discord instead of crash-looping.

Is this safer than a traditional automation stack like n8n?

No — it’s more capable and less predictable. Use it for work where creative judgement matters (drafting, analysis, recovery). Use n8n or a plain cron job for deterministic pipelines.


I’m documenting the full build — skills, scripts, the self-healing loop, the 21-task schedule — in my Build & Automate community. If you want the step-by-step modules with real production code, that’s where it lives.


This post was published using Notipo — my Notion-to-WordPress sync tool. Write in Notion, publish to WordPress automatically.

Similar Posts