AI assistant that actually knows about cars

The client had a marketplace with thousands of car listings and a growing user base that didn't always know what they were looking for. A first-time buyer in the region might know their budget and that they want something fuel-efficient, but they wouldn't know whether that means a specific generation of a popular sedan or a completely different model they've never considered. The idea was simple: give users a conversational interface where they describe what they need, and get back useful, specific guidance drawn from what's actually available on the platform.

proxy and system prompt

AI chat interface showing a conversation where the assistant recommends cars based on user's budget and needs

We integrated Claude as the AI backbone, proxied through a Next.js Route Handler that sits between the client and the Anthropic API. The proxy layer handles three things: authentication (only logged-in users can access the assistant), rate limiting (per-user, per-hour caps to control costs), and system prompt injection. The system prompt anchors Claude's responses firmly in the automotive domain — it knows it's operating inside a car marketplace in Central Asia, understands the local market context (popular brands, typical price ranges, common buyer concerns like parts availability and resale value), and is instructed to reference specific listing categories rather than giving generic advice.

cost controls and rate limits

Rate limiting was the first real engineering challenge. The AI assistant is expensive per query compared to a database read, and without controls, a handful of curious users could burn through the monthly API budget in days. We implemented a sliding window rate limiter that tracks requests per user with configurable limits — currently set at 20 queries per hour, 100 per day. When a user approaches their limit, the UI warns them. When they hit it, the assistant politely explains the cooldown period. The limits are stored in the database rather than in-memory, so they persist across deployments and work correctly even when requests are distributed across multiple serverless instances.

20 hourly, 100 daily queries

DB-backed sliding windows cap spend and behave correctly when traffic spreads across serverless instances.

regional access and proxies

assistant providing a comparison between two cars with pros and cons for the user's specific situation

The more interesting challenge was regional API access. The Anthropic API isn't directly reachable from all networks in Central Asia. For users and infrastructure in restricted regions, we added optional SOCKS proxy support in the Route Handler. The proxy configuration is environment-variable driven — when the proxy URL is set, outbound requests to Claude route through it; when it's not, they go direct. This sounds simple, but proxy reliability became a recurring issue. Proxies go down, rotate IPs, or introduce latency spikes that cause timeouts. We added retry logic with fallback — if the primary proxy fails, the system tries a secondary before returning an error. Health checks run on a schedule to flag degraded proxies before users notice.

prompts over fine-tuning

Keeping responses relevant without fine-tuning was a deliberate trade-off. Fine-tuning would have given us tighter control over the output, but it requires ongoing maintenance — every time the catalog changes or new models appear, the training data would need updating. Instead, we invested in prompt engineering. The system prompt includes structured context about the marketplace's categories, price segmentation, and common user personas. For queries about specific listings, the assistant can reference the platform's search parameters and guide users toward the right filters. It doesn't have direct database access (a future improvement), but it knows enough about the platform's structure to give actionable directions.

Optional SOCKS proxy path

Environment-driven routing with primary and secondary proxy retries keeps Claude reachable where direct API access is blocked.

usage and takeaway

The result is an assistant that handles about 300-400 conversations per day, with users averaging three to four messages per session. Most queries fall into predictable patterns — budget-based recommendations, comparisons between two models, questions about ownership costs — and Claude handles these well with the domain-anchored prompt. The occasional off-topic query (users trying to get help with unrelated tasks) gets redirected gracefully.

The takeaway: bolting an LLM onto a product is easy. Making it useful — scoped to the domain, cost-controlled, reliable across unreliable infrastructure — is where the actual work lives. The proxy reliability issue alone consumed more engineering time than the entire AI integration. And the rate limiter, which felt like a boring implementation detail at first, turned out to be the single most important feature for keeping the project financially viable.

Stack

Frontend: Next.js 15, React 19, Tailwind CSS

AI: Anthropic Claude API, system prompt engineering

Backend: Next.js Route Handler, rate limiter (DB-backed), SOCKS proxy (optional)

Infrastructure: Prisma 6, PostgreSQL