|

Build a Voice-Enabled AI Agent in n8n

When I started playing with AI Agents inside n8n, I wanted to go beyond just asking questions. I wanted a real assistant—something I could talk to (literally), that could respond, take action, and help automate everyday tasks like sending emails or booking meetings.

The result was a self-hosted AI Agent that works entirely through Telegram. It understands both text and voice messages, and responds in kind. It’s also integrated with Gmail, Google Calendar, and Airtable.

You can read up on Airtable here.

This article walks you through how I built it—from trigger to tools to TTS response.

What It Does

Here’s what my AI Agent can do:

  • Accept text or voice input via Telegram.
  • Transcribe voice using OpenAI Whisper.
  • Use GPT-4o-mini via the AI Agent node to reason over the prompt.
  • Automatically use tools like Gmail, Google Calendar, or Airtable.
  • Reply either as a message or an audio file using Kokoro TTS.

Overview of the Stack

  • n8n (self-hosted)
  • AI Agent (n8n’s integration)
  • GPT-4o-mini via OpenAI
  • Airtable for contact lookup
  • Gmail and Google Calendar
  • Kokoro TTS (self-hosted)
  • Telegram Bot as frontend

1. Telegram as the Frontend

The workflow starts with a Telegram Trigger node. It picks up both text messages and voice recordings.

2. Handling Voice Input

If the message is a voice file, I use Telegram’s file API to download it and pass it to OpenAI Whisper (via the Transcribe Audio node). This gives us a clean text version of what the user said.

3. Configuring the AI Agent

The AI Agent node connects GPT-4o-mini with a memory buffer and system instructions. It uses the user’s Telegram chat ID as a custom session key to keep conversations contextual.

My system prompt includes tool usage guidance and a fallback if no user or email is provided:

You are a helpful assistant.
Always use the “Get Contacts” tool for finding an email address for the “Send Email” and “Book Event” tool. If I fail to provide a User or Email you default to (myself) “My Name” at Email “MyEmailAddress”

4. Tools the Agent Can Use

Tools are defined using AI Agent’s-compatible nodes. In my case, the agent can:

  • Look up contacts in Airtable
  • Send email via Gmail
  • Book events via Google Calendar
  • Call another n8n workflow to fetch API endpoints

Each tool connects to the AI Agent through the Tool input port, like this:

5. Responding Back to the User

After generating the output, the workflow branches based on whether the input was text or voice:

  • If text → reply using Telegram text node.
  • If voice → send the output to Kokoro TTS to generate an .mp3 voice message, and send that back.

The Kokoro HTTP node uses a POST request with payload like:

{
  "model": "kokoro",
  "input": "{{ $json.output }}",
  "voice": "am_adam",
  "response_format": "mp3",
  "download_format": "mp3",
  "return_timestamps": false,
  "speed": 1
}

This gives you natural-sounding audio replies directly inside Telegram.

Real Examples

Here are a few real prompts I’ve tested:

  • Send an email to Alex confirming our call tomorrow.” → Agent finds Alex’s email in Airtable and sends the message via Gmail.
  • Book a meeting with Jane next Tuesday at 2pm.” → Adds an event to my Google Calendar.
  • Voice: “What’s my next meeting?” → Transcribed and answered with a voice reply.

Final Thoughts

This setup gives me a real assistant that lives inside Telegram. It listens to me, thinks, takes action, and speaks back. I use it daily—and I’m just getting started.

You could easily extend this with more tools: document summarization, Notion notes, file handling, or webhook integrations.

For companies, this kind of setup can also solve real operational problems:

  • Auto-schedule meetings with leads using natural language (e.g., “Book a demo with Sam next Thursday”)
  • Look up internal contacts or CRM data and send contextual emails
  • Summarize meeting transcripts and send key takeaways to Slack or Teams
  • Trigger IT workflows or incident responses via chat
  • Answer internal questions from private documentation in Notion, Confluence, or Google Drive
  • Generate quick reports from Airtable, dashboards, or APIs

If you want to try it out yourself, I’d recommend starting with the voice + text input logic and AI Agent node—then layer on tools as you go.

Want the full workflow JSON or a walkthrough? DM me on LinkedIn or X — happy to share!

I’ll be sharing more n8n automation workflows here on the blog soon — stay tuned!

Similar Posts

Leave a Reply

Your email address will not be published. Required fields are marked *