AI as a Service – Containerized ML Pipeline on Kubernetes

Timeframe: Spring 2025
Stack: PyTorch · FastAPI · Docker · Kubernetes (GKE) · Google Artifact Registry · Python

Overview

Built a complete cloud-native ML pipeline: CNN trained on MNIST as a Kubernetes batch job, model served through a scalable REST API.

Shared PVC (5 Gi) to pass trained model artifacts from training job to inference pods
Training job: downloads MNIST, trains CNN for 5 epochs (Adam, lr=1e-3), writes model to PVC
Inference deployment: 2 replicas load model from PVC, serve three prediction endpoints via FastAPI (JSON input, image upload, random test-set sample)
LoadBalancer service exposes inference pods externally on port 80
Deployed on Google Kubernetes Engine (GKE) with images stored in Google Artifact Registry