Felipe Reis — Senior ML & AI Engineer

Hi, I'm Felipe. From experiment to production — at any scale.

6+ years building production ML and AI systems across startups and enterprise. AWS-certified ML Engineer with deep expertise in Kubernetes, LLMs, and MLOps. Driven by clean code, solid infrastructure, and measurable impact.

Technical Focus

Machine Learning Systems

Designing, training, and productionizing models across the full lifecycle — from experimentation to reliable, low-latency inference at scale. Built platforms serving tens of millions of predictions per week.

AI Engineering

Building LLM-powered applications and chatbots, RAG pipelines, and AI agents. Instrumenting and monitoring AI systems for quality, drift, and cost in production environments.

MLOps & Deployment

Containerized training workflows, experiment tracking with MLflow, automated CI/CD pipelines, and model versioning for reproducible, auditable model delivery on SageMaker and Kubernetes.

Data & Feature Pipelines

Building robust ingestion and transformation pipelines with Spark and Airflow. Designing feature stores — both online and offline — that keep training and serving distributions consistent.

Observability & Monitoring

Instrumenting models and infra with Prometheus and Grafana. Detecting data drift, scoring degradation, and chatbot quality regressions through automated alerting and structured dashboards.

Platform Engineering

Architecting cloud-native ML platforms on AWS — compute clusters, model registries, Terraform-managed infrastructure, and the Kubernetes backbone that ties experimentation, training, and serving together.

Projects

High-Scale ML Inference Platform

Architected and built a production ML serving platform that reached tens of millions of predictions per week. Delivered microservices for training, AutoML hyperparameter search, fairness measurement, and model explainability — all running on Kubernetes with automated CI/CD.

Python
Kubernetes
SageMaker
MLflow
Terraform
FastAPI

View project

LLM Observability Suite

Designed and implemented end-to-end monitoring for LLM-powered chatbots and AI scoring models. Built dashboards and alert pipelines covering response quality, data ingestion health, and model score drift using Prometheus and Grafana across online and offline feature paths.

Python
Prometheus
Grafana
AWS
SageMaker
Airflow

View project

Spark ETL Acceleration

Migrated a suite of PySpark ETL jobs to idiomatic Scala/Spark, achieving up to 6× throughput improvement. Simultaneously simplified a Computer Vision pipeline by replacing a heavy model with a lighter algorithm that improved both accuracy and cost — a recurring theme: less complexity, better results.

Scala
Spark
Python
Docker
AWS

View project

Stack

Python

Scala/Spark

AWS

Kubernetes

Terraform

MLflow

SageMaker

Kubeflow

Airflow

Prometheus

Grafana

PyTorch

LightGBM

LLMs / RAG

FastAPI

Docker

Kafka

Argo

CI/CD

Hi, I'm Felipe. From experiment to production — at any scale.

Technical Focus

Machine Learning Systems

AI Engineering

MLOps & Deployment

Data & Feature Pipelines

Observability & Monitoring

Platform Engineering

Projects

High-Scale ML Inference Platform

LLM Observability Suite

Spark ETL Acceleration

Stack

Open to new projects