AI sales consultant that never sleeps

The client ran a language school with a small admissions team — two people handling inbound inquiries over WhatsApp. Every prospective student would message the school's number asking about schedules, prices, program details, and placement tests. During business hours, the team could usually respond within 10-15 minutes. After hours, messages piled up. By morning, half the leads had already messaged a competitor. The school wasn't losing students because of its product — the courses were solid. It was losing them because human response time has a ceiling, and WhatsApp doesn't wait.

architecture

We built an AI-powered sales assistant that handles the entire first-contact conversation through WhatsApp — answering questions, explaining course options, and qualifying leads — without a human touching anything until the prospect is ready to enroll. The architecture has three moving parts. A Node.js bridge built on the Baileys library connects to WhatsApp via QR code authentication and handles the messaging layer: receiving texts, sending responses, showing typing indicators, processing media. It forwards every incoming message as a webhook to a FastAPI backend, which is where the actual intelligence lives. The backend routes each conversation to a Claude-powered agent loaded with structured knowledge about the school — course catalogs, pricing tiers, schedules, placement test procedures, FAQ answers. The agent doesn't just answer questions; it steers the conversation toward qualification, categorizing each lead as ready to buy, qualified but not yet decided, or postponed.

Baileys bridge to FastAPI

Node forwards every message to a Claude agent over webhooks while QR-authenticated WhatsApp stays on the unofficial client.

voice and trust

Voice messages were a requirement from day one — in this market, people send voice notes more often than they type. When a voice message comes in, the bridge saves the audio file and passes it through OpenAI's Whisper for transcription before forwarding the text to Claude. This sounds straightforward, but WhatsApp compresses audio aggressively, and people record voice messages in noisy environments — cars, cafés, streets. Raw transcription accuracy was inconsistent enough to cause misunderstandings. We added confidence thresholds: if Whisper's transcription falls below a certain certainty level, the bot asks the user to repeat or rephrase instead of guessing and responding to something they didn't say. It's a small thing, but it prevented the kind of embarrassing misread that kills trust in an automated conversation.

Whisper confidence thresholds

Low-certainty voice transcripts trigger a repeat request instead of guessing under noisy, compressed audio.

tone and dependencies

three-service architecture diagram: Node.js bridge, FastAPI backend, PostgreSQL

The hardest part of this project wasn't the architecture — it was making Claude sound like a real school consultant. Early prompts produced responses that were helpful but obviously robotic: too structured, too formal, too eager to list bullet points. Real consultants are warmer, they use shorter sentences, they ask follow-up questions instead of dumping information. We went through multiple rounds of prompt engineering, injecting real conversation transcripts from the admissions team as examples, tuning the system prompt to match the school's voice, and building guardrails so the agent knows when to hand off to a human instead of improvising. The knowledge base is structured as tool calls — the agent can look up specific course details, check schedule availability, and pull pricing — rather than stuffing everything into the context window.

The other persistent challenge is Baileys itself. It's an unofficial WhatsApp library — there's no commercial support, no SLA, and session stability is unpredictable. The QR code authentication can expire without warning, requiring a manual re-scan. We built health checks and automatic alerts when the bridge loses connection, but the fundamental reality is that you're running on a reverse-engineered protocol. It works well enough for this use case — the school isn't sending thousands of messages per hour — but it's the kind of dependency that requires honest communication with the client about its limitations.

results

The result: the school now responds to every WhatsApp inquiry within seconds, 24 hours a day. The admissions team shifted from answering the same ten questions repeatedly to focusing on high-intent leads that the bot has already warmed up and qualified. Lead-to-enrollment conversion improved noticeably, and the team handles roughly three times the inquiry volume they did before — not because they work harder, but because the bot filters out the noise. The honest takeaway: the AI agent is only as good as the knowledge you feed it and the personality you shape. The technology was the easy part. Getting the tone right so that prospects don't realize they're talking to a bot — that took longer than building the pipeline.

Stack

Backend: FastAPI, SQLAlchemy + asyncpg, PostgreSQL, Anthropic Claude, OpenAI Whisper

Bridge: Node.js, Baileys, Express

Architecture: three-process system (bridge + backend + database), webhook-driven messaging