Owns AppRocket's AI practice end-to-end: research, evaluation, model selection, and production delivery of LLM and ML systems for clients in regulated industries.
The center of gravity for the role is eval discipline — the difference between a demo that wows the client and a system the client can confidently deploy in front of their attorneys, their underwriters, or their clinical staff. AppRocket's eval framework, refined across the Casetrack production deployment and a portfolio of legal-vertical engagements, is now the deliverable that anchors every $15K AI Readiness Audit.
Day-to-day work spans foundation model selection (Claude, GPT, Gemini), retrieval architecture (Pinecone, Weaviate, custom hybrid), agent orchestration (LangGraph, custom state machines), and the ML observability stack (Arize, Langfuse, custom). Cross-checks every architecture decision against the realities of operating it: latency SLOs, hallucination budgets, human-in-the-loop checkpoints, and the cost of being wrong in front of a paying customer.