How I Built a Self-Healing AI Content Agent with Claude Agent SDK
I got tired of the content grind. Write a post, format it for X, reformat for Bluesky, render a video, upload to YouTube, cross-post to Instagram — repeat daily. It was eating 2-3 hours every day.
So I built an AI agent that does all of it. It runs 24/7 on a Mac mini, publishes to five platforms, and fixes itself when things break. Claude Code runs on my existing Max subscription — the only variable cost is Gemini for image generation, which is pennies per image.
This isn’t a chatbot. It’s a persistent agent that runs on a schedule, does real work, sends me the results on Discord, and goes back to monitoring. I approve everything from my phone.
Here’s the full architecture, with real code from the production system.
What the Agent Actually Does
Every day, without me touching anything:
- Scans X for viral tweets in the AI/automation space and drafts quote tweets with tactical breakdowns
- Generates YouTube Shorts — writes scripts, creates voiceover with Edge TTS, renders video with Remotion, generates thumbnails, uploads
- Publishes to X, Bluesky, Instagram, and YouTube — each formatted for the platform
- Tracks analytics — pulls YouTube stats, Google Search Console data, and X engagement metrics
- Sends everything to Discord — I tap ✅ or ❌ from my phone
The approval step is important. This isn’t a “set it and forget it” system. Every piece of content gets human review before it goes live. The agent handles the 95% of work that’s execution — I handle the 5% that’s judgment.
The Stack
Here’s what’s running:
| Component | What It Does | Cost |
|---|---|---|
| Mac mini M4 Pro | Hardware — runs everything locally | $500 one-time |
| Claude Agent SDK | The AI brain — TypeScript, persistent session, typed tools | Included in Max subscription |
| Discord.js | Control plane — approvals, alerts, commands | Free |
| Edge TTS | Voiceover generation (Microsoft neural voices) | Free |
| Remotion | Video rendering (React-based) | Free (open source) |
| Whisper.cpp | Word-level caption sync on Apple Silicon | Free |
| MCP Servers | Connects YouTube, X, Gmail, Airtable, GSC APIs | Free |
| pm2 | Process manager — auto-restart, logging, monitoring | Free |
Total recurring cost: near zero, if you exclude my Claude Code Max subscription I already pay for. Gemini image generation is the only variable API cost — a few cents per image. Compare that to a content manager ($3-5K/month) or even a virtual assistant ($500-1K/month).
Architecture: One Persistent Agent Session
The first version was a Python daemon that spawned separate claude -p processes for each task. It worked, but each task started with zero context. The agent couldn’t remember what it posted yesterday or what performed well last week.
The current architecture is fundamentally different: one persistent TypeScript agent session that never dies. Everything — Discord messages, scheduled tasks, webhook events — feeds into the same agent loop. The agent maintains full context across all interactions.
// index.ts — the entry point
async function main() {
const agent = await startAgent() // persistent Agent SDK session
const bot = await startDiscord(agent) // Discord.js — feeds messages into agent
startScheduler(agent) // node-cron — feeds tasks into agent
startWebhooks(agent) // GitHub webhook listener
await bot.sendMessage('general', 'Koda online. Ready for tasks.')
}
The Agent Core
Built on Anthropic’s Claude Agent SDK. The key feature: streaming input mode. The agent stays alive and accepts new messages at any time — from Discord, from the scheduler, or from webhook events.
// agent.ts — persistent session with message queue
const agent = new Agent({
model: 'claude-sonnet-4-20250514',
tools: [...mcpTools, ...agentTools],
systemPrompt: loadFile('SOUL.md'),
})
// Message queue — Discord, scheduler, webhooks all push here
const messageQueue: Message[] = []
export async function sendMessage(content: string, source: string) {
messageQueue.push({ content, source, timestamp: Date.now() })
await processQueue()
}
Session persistence means the agent recovers from restarts. On crash, pm2 restarts the process. The agent loads its session ID from disk and resumes with full context history.
The Scheduler
17 scheduled tasks running on node-cron. Each task is a prompt that gets fed into the same agent session — so the agent has full context when executing.
// scheduler.ts
const tasks: ScheduledTask[] = [
{
name: 'youtube_analytics',
cron: '0 7 * * *', // 7 AM daily
prompt: 'Pull YouTube analytics for the last 7 days. Compare to previous period.',
type: 'silent' // runs without approval
},
{
name: 'viral_scan',
cron: '0 10 * * *', // 10 AM daily
prompt: 'Scan X for viral tweets in AI/automation. Draft quote tweets.',
type: 'approval' // sends to Discord for approval
},
{
name: 'social_post',
cron: '0 12 * * *', // Noon daily
prompt: 'Draft a post for X following brand-voice-skill.md.',
type: 'approval'
},
{
name: 'goal_check',
cron: '0 8 15 * * *', // 8:15 AM daily
prompt: 'Check GOALS.md. If any goal is behind, propose actions.',
type: 'silent'
}
]
Tasks are deduplicated — if youtube_analytics already ran today, it gets skipped on re-runs. Results are tracked in .task-results/YYYY-MM-DD.json.
Self-Healing
When a task fails, the agent doesn’t just log the error. It gets the full error output in context and tries to fix it — because it’s the same persistent session, it already knows the codebase and recent changes.
Up to 2 heal attempts per task. If it still fails, I get a Discord alert with the error details.
The difference from the old Python daemon: the old system spawned a fresh Claude instance to heal, which had zero context about what went wrong. The new system heals within the same session — the agent already knows what it was trying to do, what tools it called, and what the error means.
Risk Classification
Not every action should require approval. Checking analytics is safe. Posting a tweet needs a human eye.
The agent classifies every tool call by risk level:
// YOLO risk classifier
const riskLevels = {
HIGH: ['post_tweet', 'publish_video', 'delete_tweet', 'gmail_send'],
MEDIUM: ['generate_image', 'skool_airtable_sync', 'create_record'],
LOW: ['youtube_analytics', 'gsc_search_analytics', 'gmail_search']
}
HIGH-risk actions get sent to Discord for approval before executing. LOW-risk actions run silently. MEDIUM adapts based on whether I’m active in Discord or idle.
Process Management: pm2
The agent runs under pm2 — a Node.js process manager that handles auto-restart, logging, and monitoring.
# Start the agent in daemon mode
npm run daemon # runs: pm2 start ecosystem.config.js
# Check status
pm2 status
# View logs
pm2 logs koda
# Restart
pm2 restart koda
pm2 restarts the agent automatically on crash (max 10 restarts). Logs go to data/logs/koda-*.log. The agent sends a startup message to Discord every time it boots, so I know when restarts happen.
Discord as Control Plane
This was a better choice than building a web dashboard. Discord is always on my phone, supports rich embeds, reactions, and threads — and it’s free.
The Discord bot (discord.js) routes messages bidirectionally:
- Me → Agent: I type in the Discord channel, the bot feeds it to the agent session
- Agent → Me: The agent sends results, approvals, alerts back to Discord
- Reactions: ✅ to approve, ❌ to reject content before publishing
The agent sends structured messages for different events:
Content approval:
🎬 New YouTube Short ready for review
Title: A Teaspoon of Neutron Star Weighs 6 Billion Tons
Duration: 58 seconds
Platforms: YouTube, Instagram
React ✅ to approve or ❌ to reject
Analytics digest:
📊 YouTube Analytics — Last 7 Days
Views: 1,247
Watch time: 42.3 hours
Subscribers: +8
Top video: Neutron Star (959 views)
Shorts feed: 93% of traffic
Self-healing alert:
🔧 Self-healed: viral_scan
Error: X API rate limit exceeded
Fix: Added exponential backoff (2s, 4s, 8s)
Status: Task completed on retry #1
I can also send commands directly in the Discord channel — “post this to X”, “check YouTube stats”, “draft a blog post about X.” The agent picks it up and responds in the same thread.
The Video Pipeline
This is where the automation really shines. Going from a script to a published YouTube Short takes about 3 minutes of compute time and zero human effort (besides the approval tap):
- Script → Claude writes the narration based on a topic
- Images → Gemini generates scene-specific images matching each narration segment
- Voiceover → Edge TTS (
en-US-AndrewNeural) generates natural-sounding audio - Captions → Whisper.cpp creates word-level timestamp sync on Apple Silicon Metal
- Render → Remotion composites everything into a vertical 1080×1920 video
- Thumbnail → Gemini generates a background, Python overlays title text with glow effects
- Preview → Compressed version sent to Discord for approval
- Publish → Uploads to YouTube and Instagram simultaneously
# The full pipeline in one command
python orchestrate_video.py tutorials/neutron-star.json
# Or step by step
python generate_voiceover.py tutorials/neutron-star.json --update-durations
npx remotion render TechTutorial out/neutron-star.mp4 --props=/tmp/props.json --gl=angle
python publish.py out/neutron-star.mp4 --title "Title" --platforms youtube,instagram
Total cost per video: a few cents in Gemini API fees for images. Edge TTS, Whisper, and Remotion are all free.
The Memory System
An agent that forgets everything between sessions is useless for content work. It needs to know what topics performed well, what voice to use, what mistakes to avoid.
I built a 6-layer memory system:
Layer 1: Bootstrap files — loaded every session. Identity (SOUL.md), skills (SKILL.md), user context (USER.md), operational rules (CLAUDE.md). These are the agent’s “personality.”
Layer 2: Observations — the agent records patterns as it works using the observe() tool. “Space Shorts get 10x more views than nature curiosities.” “Negation lists outperform feature lists on X.” Tagged by type: rule, preference, fact, habit, event.
Layer 3: Dream cycle — a nightly job that consolidates observations. Deduplicates similar entries, applies importance decay (rules last 365 days, events expire in 14 days), and promotes recurring patterns to the curated learnings file.
Layer 4: Daily logs — what happened today. Actions, outcomes, decisions, errors. Written continuously throughout the session, not batched at the end.
Layer 5: Curated learnings — the distilled “brain.” Under 100 lines of hard-won knowledge. “Shorts must be under 60 seconds.” “Images must exactly match narration.” These feed directly into content decisions.
Layer 6: Search — before making any decision, the agent searches across all layers for relevant past context.
The dream cycle is the key innovation. Without it, observations pile up forever and the agent drowns in noise. With it, only patterns that appear 3+ times get promoted to long-term memory. Everything else decays naturally.
MCP Servers: Connecting Everything
Model Context Protocol (MCP) is how the agent talks to external services. Each API gets its own MCP server that exposes typed tools:
- YouTube MCP — upload videos, get analytics, manage playlists, read comments
- X MCP — post tweets, get engagement metrics, delete posts
- Bluesky MCP — post, repost, like, get timeline
- Gmail MCP — search emails, send, create drafts, manage calendar
- Airtable MCP — read/write tables for CRM and content tracking
- Google Search Console MCP — search analytics, indexing status, submit sitemaps
- n8n MCP — workflow management and data tables
- Context7 MCP — documentation lookup for any library
The agent calls these tools naturally in conversation:
Agent: "Let me check yesterday's YouTube performance."
→ Calls youtube_analytics_overview(start_date="2026-04-04")
→ "Views were up 23% — the magnetar Short is picking up.
847 views in the first 24 hours."
No custom integration code. No webhook plumbing. The MCP server handles auth, rate limiting, and response formatting. The agent just calls the tool and gets structured data back.
vs. Hiring a Content Manager
| AI Agent | Content Manager | Virtual Assistant | |
|---|---|---|---|
| Monthly cost | ~$0 (beyond existing subscription) | $3,000-5,000 | $500-1,000 |
| Availability | 24/7 | Business hours | Part-time |
| Platforms | 5 simultaneously | 2-3 usually | 1-2 |
| Video production | Automated | Outsourced ($$$) | Manual |
| Ramp-up time | Already knows your voice | 2-4 weeks | 1-2 weeks |
| Scaling | Same cost at 10x volume | Linear cost increase | Linear |
| Judgment calls | Needs human approval | Independent | Needs guidance |
The agent wins on execution speed and cost. A human wins on creative judgment and strategy. The sweet spot: agent handles 95% of execution, human handles 5% of decisions.
vs. n8n / Make / Zapier
I used n8n for a year before building this. Here’s why I switched:
- n8n handles data flow between APIs. It’s great for “when X happens, do Y.” But content creation isn’t a linear flow — it requires judgment, context, and iteration.
- An AI agent can look at analytics, decide what content to create, write it, generate assets, format it for each platform, and adapt based on what worked last time. Try doing that in a node-based workflow.
- The tradeoff: n8n is more reliable for simple automations. The agent is more capable but needs monitoring (hence pm2 and the self-healing system).
If your workflow is “trigger → transform → send,” use n8n. If your workflow involves creative decisions, use an agent.
Getting Started
If you want to build something similar:
- Start with the Agent SDK. Don’t wrap Claude Code CLI in a shell script like I did in v1. The Claude Agent SDK gives you typed tools, streaming input, and session persistence out of the box.
- Add Discord early. It’s your control plane. Every action should send a message, every publish should require approval. Discord.js makes this straightforward.
- Use pm2 for process management. Auto-restart on crash, log rotation, and monitoring — all built in. Don’t build a custom watchdog.
- Build a risk classifier. Not everything needs approval. Analytics and reads are safe. Posts and deletes need a human. Classify your tools and only gate the dangerous ones.
- Use the memory system. Even a simple LEARNINGS.md file that the agent reads at startup makes a massive difference in content quality over time.
I’m documenting the full build process — agent setup, MCP server configuration, video pipeline, memory system — in my Build & Automate community. If you want step-by-step modules with real production code, that’s where it’s happening.
This post was published using Notipo — my Notion to WordPress sync tool. Write in Notion, publish to WordPress automatically.