Machine Learning Systems
Designing, training, and productionizing models across the full lifecycle β from experimentation to reliable, low-latency inference at scale. Built platforms serving tens of millions of predictions per week.
Senior ML & AI Engineer
6+ years building production ML and AI systems across startups and enterprise. AWS-certified ML Engineer with deep expertise in Kubernetes, LLMs, and MLOps. Driven by clean code, solid infrastructure, and measurable impact.
Online
Stack
Disciplines
Designing, training, and productionizing models across the full lifecycle β from experimentation to reliable, low-latency inference at scale. Built platforms serving tens of millions of predictions per week.
Building LLM-powered applications and chatbots, RAG pipelines, and AI agents. Instrumenting and monitoring AI systems for quality, drift, and cost in production environments.
Containerized training workflows, experiment tracking with MLflow, automated CI/CD pipelines, and model versioning for reproducible, auditable model delivery on SageMaker and Kubernetes.
Building robust ingestion and transformation pipelines with Spark and Airflow. Designing feature stores β both online and offline β that keep training and serving distributions consistent.
Instrumenting models and infra with Prometheus and Grafana. Detecting data drift, scoring degradation, and chatbot quality regressions through automated alerting and structured dashboards.
Architecting cloud-native ML platforms on AWS β compute clusters, model registries, Terraform-managed infrastructure, and the Kubernetes backbone that ties experimentation, training, and serving together.
Selected Work
Architected and built a production ML serving platform that reached tens of millions of predictions per week. Delivered microservices for training, AutoML hyperparameter search, fairness measurement, and model explainability β all running on Kubernetes with automated CI/CD.
Designed and implemented end-to-end monitoring for LLM-powered chatbots and AI scoring models. Built dashboards and alert pipelines covering response quality, data ingestion health, and model score drift using Prometheus and Grafana across online and offline feature paths.
Migrated a suite of PySpark ETL jobs to idiomatic Scala/Spark, achieving up to 6Γ throughput improvement. Simultaneously simplified a Computer Vision pipeline by replacing a heavy model with a lighter algorithm that improved both accuracy and cost β a recurring theme: less complexity, better results.
Tools & Technologies
Let's talk
Available for consulting, freelance, and senior full-time roles.
Get in Touch