Owns AppRocket's AI practice end-to-end: research, evaluation, model selection, and production delivery of LLM and ML systems for clients in regulated industries.
The center of gravity for the role is the evaluation harness. It is the difference between a demo that wows the client and a system the client can confidently deploy in front of their attorneys, their underwriters, or their clinical staff. AppRocket's evaluation framework was refined across the Casetrack production deployment and a portfolio of legal engagements. It is now the deliverable that anchors every $15K AI Readiness Audit.
Day-to-day work spans foundation model selection (Claude, GPT, Gemini), retrieval architecture (Pinecone, Weaviate, custom hybrid), agent orchestration (LangGraph, custom state machines), and the ML observability stack (Arize, Langfuse, custom). Every architecture decision is cross-checked against the realities of operating it: latency SLOs, hallucination budgets, human-in-the-loop checkpoints, and the cost of being wrong in front of a paying customer.