One assistant, everywhere the user already is.
SarahAI spans a native app, in-app chat and a major WhatsApp integration — people wanted scheduling, reminders, messaging, summaries and natural AI conversation by text or voice, across whichever surface they reached for. That meant orchestrating multiple AI models behind one coherent agent, remembering context across days and channels, delivering time-based messages reliably at scale, and staying fully within Meta's strict messaging policies. The hard part was never a single feature — it was making all of them one dependable system.
One agent, many models
A routing layer picks between GPT-4o/4.5, Gemini and Claude — falling back to open models on Groq & Cerebras via a dedicated MCP server based on latency and cost.
Memory as a first-class citizen
RAG-backed context with Langfuse tracing, plus agents that write back to their own knowledge base from past interactions.
Scheduling without servers
Every time-based message is an independently scheduled EventBridge event draining through SQS into Lambda — no standing worker fleet.
Agent runtime
One agent over many models. A routing layer chooses between GPT-4o/4.5, Gemini and Claude by intent, latency and cost, while the Task module invokes the right tools per turn instead of a fixed pipeline.
Memory & MCP tools
RAG-backed long-term context traced in Langfuse, with agents writing back to their own knowledge base. A dedicated MCP server fronts open-source fallbacks (GPT-OSS) on Groq & Cerebras behind a clean tool boundary.
Serverless scheduling
The notification engine re-architected from Celery CRON to Lambda + SQS + EventBridge Scheduler. Each reminder is an independently scheduled event — effectively infinite scale with zero standing infrastructure.
Compliance & billing
Meta-compliant opt-in consent and automated renewal flows wired into the send path via Respond.io, plus subscription-lifecycle management and feature-gating enforcing entitlements across the platform.
Multi-model orchestration
Route intelligently across providers without exposing the seams to the user.
Agent memory
Keep context coherent across sessions without bloating every prompt.
WhatsApp compliance
Honor Meta's opt-in and consent rules on every single send.
Notification delivery
Fire thousands of precisely-timed messages reliably and at scale.
Cost optimization
Keep inference spend predictable as conversation volume grows.
Decision
Front open-source models with a dedicated MCP server
Why
A clean tool/model boundary lets the agent route by latency and cost, and squeezes maximum quality from GPT-OSS models on Groq & Cerebras.
Tradeoff
An extra network hop and a fallback matrix to maintain — accepted for predictable cost and clean failover.
Decision
Replace Celery CRON with Lambda + SQS + EventBridge
Why
Time-based messaging shouldn't depend on a warm worker fleet. Serverless scheduling scales with demand and removes infra to babysit.
Tradeoff
Cold-starts and distributed debugging — accepted to make scheduling effectively infinite and zero-maintenance.
Decision
Treat Meta compliance as architecture, not an afterthought
Why
Deliverability depends on policy adherence. Opt-in consent templates and auto-renewal flows are wired into the send path via Respond.io.
Tradeoff
More moving parts in the messaging flow — accepted to protect the channel itself.
The notification engine went from a hand-scaled CRON bottleneck to infinitely scalable, zero-maintenance serverless scheduling. Model routing kept frontier models for the hard turns while serving the bulk cheaply, and self-learning agents compounded accuracy instead of resetting each session — all while staying fully Meta-compliant.