Sync engine for tracking AI tool costs

The client was running multiple accounts on an AI-powered coding assistant — different seats for different team members, some on business plans, others on individual subscriptions. Every month, reconciling the costs was a manual process: log into each account, check the usage dashboard, screenshot the numbers, paste them into a spreadsheet. With five accounts it was annoying. As the team grew toward fifteen, it became unsustainable. They needed a centralized system that pulls usage data from every account automatically and keeps it in sync.

polling and normalization

polling architecture diagram showing multi-account sync flow into PostgreSQL

We built a polling and synchronization engine that authenticates into each account using stored session cookies and scrapes the usage data through the tool's internal API endpoints. The system hits multiple endpoints per account: filtered usage events (individual requests with model names, token counts, and costs), usage summaries (aggregated totals per billing period), invoices, hard limits, and account metadata. Each polling cycle runs independently per account on a cron schedule, so one account's slow response doesn't block the others. The raw API responses get normalized through a transformation layer that converts cents to dollars, Unix timestamps to proper datetimes, and maps internal model identifiers to human-readable names. Normalized events are upserted into PostgreSQL with a unique constraint on the combination of account ID, event timestamp, and model — ensuring that polling the same data twice never creates duplicates.

idempotency and integrity

Idempotent upserts sound simple until you hit edge cases. The first version used a unique constraint on just account ID and timestamp, which worked until two requests to different models happened within the same millisecond — one event would silently overwrite the other. Switching to a composite constraint on account ID, timestamp, and model fixed the data integrity issue but surfaced another: the API occasionally returns the same event with slightly different cost values across sequential polls, likely due to rate recalculation on their end. We handle this with an "update on conflict" strategy that always takes the latest value, paired with a change log that records when a previously synced event gets modified. It's defensive, but when you're building a system that people use for billing, being off by even a few cents erodes trust.

sessions and proxies

account status panel showing active and expired sessions

Session cookie management turned out to be the most operationally fragile piece. The cookies expire unpredictably — sometimes after days, sometimes after weeks — and there's no refresh token mechanism since we're working with browser session cookies, not an official API. When a cookie dies, the polling cycle returns authentication errors, and the account goes dark until someone provides a fresh cookie. We built detection for this: if a polling cycle gets an auth failure, the system marks the account as "session expired," stops polling it to avoid rate-limit penalties, and fires an alert. It's not elegant, but there's no elegant solution when you're working without an official API.

Proxy rotation was a late addition, born from necessity. During a period of heavy polling — pulling historical data for a newly added batch of accounts — the primary IP got rate-limited. Requests started returning 429s, and the cooldown period was long enough to make the entire sync pipeline useless for hours. We added support for rotating proxy IPs through a pool, distributing requests across multiple exit points. The proxy layer sits between the polling engine and the API, and it's optional — accounts that poll infrequently can go direct, while heavy-polling accounts route through the proxy pool. Configuration is per-account, which keeps things flexible without adding unnecessary complexity for the common case.

results

The engine now syncs usage data across all accounts reliably, with normalized, deduplicated records landing in PostgreSQL ready for the analytics dashboard to query. The takeaway from this build: working without an official API is a calculated trade-off. You get data access that wouldn't otherwise exist, but you inherit every instability of the underlying system — session management, schema changes, rate limits. The normalization and idempotency layers aren't just nice engineering; they're the only thing standing between the client and a database full of conflicting numbers. If an official API ever ships, half of this system can be replaced in a day. Until then, it works.

Stack

Runtime: Next.js 14 (API Routes), node-cron

Database: Prisma 5, PostgreSQL

Networking: undici, https-proxy-agent

Architecture: per-account polling cycles, composite unique constraints, session cookie auth