Case Studies

Engineering deep‑dives, not portfolio cards.

Two systems I owned end-to-end — written the way I'd document them internally. Architecture, the decisions behind it, the tradeoffs I accepted, and what shipped.

2 systems· 0→1 production· Founding ownership· AI · Backend · Infrastructure
SarahAI
AI communication assistant · app, in-app chat & WhatsApp · Venturenox
2025 → Now
DjangoAWS LambdaSQS EventBridgeMCPRAG LangfuseGPT-4oGemini 2.5 Claude Sonnet 4.5GroqCerebras
01 Problem

One assistant, everywhere the user already is.

SarahAI spans a native app, in-app chat and a major WhatsApp integration — people wanted scheduling, reminders, messaging, summaries and natural AI conversation by text or voice, across whichever surface they reached for. That meant orchestrating multiple AI models behind one coherent agent, remembering context across days and channels, delivering time-based messages reliably at scale, and staying fully within Meta's strict messaging policies. The hard part was never a single feature — it was making all of them one dependable system.

02 Architecture
// request lifecycle
01
WhatsApp
Respond.io BSP · text / voice
02
AI Layer
multi-model router
M
Memory
RAG · Langfuse
T
Tools
MCP server
03
Serverless Jobs
Lambda · SQS · EventBridge
04
Notifications
Meta-compliant delivery
One agent, many models

A routing layer picks between GPT-4o/4.5, Gemini and Claude — falling back to open models on Groq & Cerebras via a dedicated MCP server based on latency and cost.

Memory as a first-class citizen

RAG-backed context with Langfuse tracing, plus agents that write back to their own knowledge base from past interactions.

Scheduling without servers

Every time-based message is an independently scheduled EventBridge event draining through SQS into Lambda — no standing worker fleet.

02· System breakdown
L/01
Agent runtime

One agent over many models. A routing layer chooses between GPT-4o/4.5, Gemini and Claude by intent, latency and cost, while the Task module invokes the right tools per turn instead of a fixed pipeline.

Model routingTool callingPrompt eng
L/02
Memory & MCP tools

RAG-backed long-term context traced in Langfuse, with agents writing back to their own knowledge base. A dedicated MCP server fronts open-source fallbacks (GPT-OSS) on Groq & Cerebras behind a clean tool boundary.

RAGLangfuseMCPGroq · Cerebras
L/03
Serverless scheduling

The notification engine re-architected from Celery CRON to Lambda + SQS + EventBridge Scheduler. Each reminder is an independently scheduled event — effectively infinite scale with zero standing infrastructure.

LambdaSQSEventBridge
L/04
Compliance & billing

Meta-compliant opt-in consent and automated renewal flows wired into the send path via Respond.io, plus subscription-lifecycle management and feature-gating enforcing entitlements across the platform.

Respond.io BSPSubscriptionsFeature-gating
03 Challenges
C/01
Multi-model orchestration

Route intelligently across providers without exposing the seams to the user.

C/02
Agent memory

Keep context coherent across sessions without bloating every prompt.

C/03
WhatsApp compliance

Honor Meta's opt-in and consent rules on every single send.

C/04
Notification delivery

Fire thousands of precisely-timed messages reliably and at scale.

C/05
Cost optimization

Keep inference spend predictable as conversation volume grows.

04 Key Decisions
Decision

Front open-source models with a dedicated MCP server

Why

A clean tool/model boundary lets the agent route by latency and cost, and squeezes maximum quality from GPT-OSS models on Groq & Cerebras.

Tradeoff

An extra network hop and a fallback matrix to maintain — accepted for predictable cost and clean failover.

Decision

Replace Celery CRON with Lambda + SQS + EventBridge

Why

Time-based messaging shouldn't depend on a warm worker fleet. Serverless scheduling scales with demand and removes infra to babysit.

Tradeoff

Cold-starts and distributed debugging — accepted to make scheduling effectively infinite and zero-maintenance.

Decision

Treat Meta compliance as architecture, not an afterthought

Why

Deliverability depends on policy adherence. Opt-in consent templates and auto-renewal flows are wired into the send path via Respond.io.

Tradeoff

More moving parts in the messaging flow — accepted to protect the channel itself.

05 Impact
Scheduling scale
0
Servers to manage
3+
AI providers routed
RAG
Self-learning memory

The notification engine went from a hand-scaled CRON bottleneck to infinitely scalable, zero-maintenance serverless scheduling. Model routing kept frontier models for the hard turns while serving the bulk cheaply, and self-learning agents compounded accuracy instead of resetting each session — all while staying fully Meta-compliant.

Informly
Business growth platform · Founding Full-Stack Engineer
2023 → Now
Nx MonorepoNext.jsTypeScript Node.jsNexusApollo GraphQL PostgreSQLInngestPydanticPython
01 Problem

Businesses needed growth tools that didn't need an engineer in the loop.

Teams wanted surveys, analytics, automations and AI assistance — across multiple locations, with enterprise-grade access control — without filing a ticket for every workflow change. As the founding engineer, I had to set the architecture from day one to balance shipping velocity with the multi-tenant, role-aware scalability the platform would eventually demand.

02 Architecture
// platform flow
01
User
multi-tenant · RBAC
02
Surveys
capture
03
Analytics
metrics understanding
04
AI Layer
Pydantic contract
05
Workflow Engine
Inngest · no-code
GraphQL as the platform spine

An Apollo + Nexus GraphQL layer replaced REST, cutting over-fetching and giving the Next.js client a single typed graph.

AI behind a schema contract

Pydantic schemas are the contract between frontend and AI backend, so survey generation, metrics and chat actions always parse.

Automations users own

A drag-and-drop engine on Inngest compiles visual workflows into durable, observable background jobs.

02· System breakdown
L/01
GraphQL platform

The whole API migrated from REST to Apollo + Nexus GraphQL — one typed graph for the Next.js client, with in-memory Apollo caching eliminating redundant round-trips for a low-latency UI.

ApolloNexusIn-memory cache
L/02
AI service layer

Pydantic schemas act as a strict contract between frontend and AI backend, so survey generation, metrics understanding and the chat agent always return shapes the typed product can trust.

PydanticSchema contractLLM
L/03
Workflow engine

A drag-and-drop builder compiling visual automations into durable, observable background jobs on Inngest — non-technical teams ship multi-step business logic without an engineer in the loop.

InngestNo-codeDurable jobs
L/04
RBAC & multi-tenancy

Multi-location support lets one account manage many branches, with fine-grained role-based access control and enterprise permission tiers isolating data and config per tenant.

RBACMulti-tenantEnterprise tiers
03 Challenges
C/01
Multi-tenancy

Isolate data and config per business while sharing one platform.

C/02
GraphQL migration

Move a live REST API to GraphQL without breaking the product.

C/03
RBAC

Fine-grained permission tiers for vendors, admins and enterprise clients.

C/04
Workflow orchestration

Let non-coders define safe, multi-step automations visually.

C/05
AI integration

Make AI output trustworthy enough for a typed product.

04 Key Decisions
Decision

Migrate the API from REST to GraphQL with Apollo + Nexus

Why

Over-fetching and N+1 round-trips were stressing the server and the client. One typed graph fixed both.

Tradeoff

A full migration of a live API — accepted for measurable performance gains and lower server stress.

Decision

Make Pydantic schemas the AI contract

Why

A typed frontend can't consume free-form model output. Schema-validated generation makes AI a reliable building block.

Tradeoff

Stricter prompts and a validation loop — accepted for output the whole system can trust.

Decision

Build automations on Inngest, not a homegrown queue

Why

Durable, observable workflow execution out of the box let me focus on the visual builder, not the runtime.

Tradeoff

Constrained building blocks instead of raw scripting — accepted to keep user workflows safe and debuggable.

05 Impact
GraphQL
REST API migrated
No-code
Automations for non-techs
RBAC
Multi-tenant, per branch
Typed
AI output by contract

The GraphQL migration cut over-fetching and server stress while delivering a low-latency, in-memory-cached UI. Non-technical teams now build their own multi-step automations, taking engineering out of the critical path for routine business logic — and the AI layer stays trustworthy because its output is contractually typed.

Want the rest of the story?

These are two of my flagship founding-level builds. Happy to walk through any part of them live.