AI study consultant that actually knows your weak spots
Most AI chatbots in education are glorified search bars with a friendly tone. The client wanted something different for their driving theory platform: an assistant that knows where a specific user is struggling, can pull up the exact rule they got wrong, checks whether they even have an active subscription, and handles voice messages and photos of road situations — all inside a Telegram bot conversation. Not a generic "ask me anything" experience, but a study consultant with real context about the person it's talking to.
tools grounded in user data
We built this on Claude Sonnet using the tool-use pattern — the model doesn't just generate text, it has a set of functions it can call: look up a user's progress and weak topics, retrieve specific questions they've answered incorrectly, search the PDD content database for relevant rules, and check subscription status. When a user asks "Why do I keep getting intersection questions wrong?", the assistant doesn't guess — it calls the progress tool, sees that the user scores 40% on priority rules at intersections, pulls up the three questions they've missed most often, and explains the underlying rule with a reference to the specific PDD point. That contextual depth is what separates this from a ChatGPT wrapper.
voice, images, and transcription
Voice and image input added another dimension. Users can send a voice message asking a question (common in this market — typing in Uzbek on a phone is cumbersome), and the system transcribes it through OpenAI Whisper before passing it to Claude. They can also photograph a road situation — a confusing intersection, an unfamiliar sign — and the assistant analyzes the image in context. The voice pipeline required careful handling: Whisper's transcription quality drops sharply with background noise, regional accents, and mixed Russian-Uzbek speech. We added confidence scoring on the transcription result, and when it falls below threshold, the bot responds with a polite "I didn't quite catch that — could you type it out or try again in a quieter spot?" rather than hallucinating an answer to a misheard question.
prompt guardrails and tool discipline
The hardest engineering challenge was prompt engineering for tool-use discipline. Claude with tools is powerful, but it has a tendency to call tools speculatively — fetching data it doesn't need, or worse, making up tool calls that don't match the defined schema. We iterated through dozens of system prompt versions to get consistent behavior: the model should only call a tool when the user's question genuinely requires that data, it should never reference information it hasn't retrieved, and it should stay strictly within the domain of traffic rules and exam preparation. Drift was a real problem early on — users would ask the bot life advice or try to get it to write poetry, and without firm boundaries it would happily oblige, burning API credits on conversations that had nothing to do with the product. The final system prompt is a carefully structured document with explicit behavioral rules, example interactions, and hard constraints on topic scope.
cost, access, and token budgets
Cost management tied directly to the subscription model. Every AI conversation costs real money — Claude API calls plus Whisper transcription for voice messages. We couldn't let unsubscribed users run up the bill. The assistant checks subscription status before engaging in a full conversation: free users get a limited number of AI interactions per day, paid users get unlimited access. The check happens at the tool level, not just in the frontend, so there's no way to bypass it by calling the API directly. We also implemented conversation-level token budgets — if a single conversation exceeds a threshold (usually someone going in circles asking the same question differently), the bot gracefully wraps up and suggests reviewing the relevant topic in the learning module instead.
results and where effort went
The result is an AI assistant that users actually trust for study guidance, because it speaks from their real data — not generic advice. The client reported that users who engage with the AI consultant at least three times per week show measurably better exam simulation scores than those who only use the question bank. The honest lesson: the AI model itself was maybe 20% of the work. The other 80% was tool orchestration, prompt guardrails, transcription fallbacks, and cost controls. Anyone can wire up a Claude API call. Making it behave reliably in a production context with real users, real money, and real consequences — that's where the engineering lives.
Stack
AI: Anthropic Claude (Sonnet), tool-use/function-calling pattern
Voice: OpenAI Whisper (transcription with confidence scoring)
Backend: FastAPI, psycopg2, PostgreSQL
Integration: Telegram Bot API, subscription-gated access, token budgeting
