AI as a Service – Containerized ML Pipeline on Kubernetes
Timeframe: Spring 2025
Stack: PyTorch · FastAPI · Docker · Kubernetes (GKE) · Google Artifact Registry · Python
Overview
Built a complete cloud-native ML pipeline: CNN trained on MNIST as a Kubernetes batch job, model served through a scalable REST API.
Architecture
- Shared PVC (5 Gi) to pass trained model artifacts from training job to inference pods
- Training job: downloads MNIST, trains CNN for 5 epochs (Adam, lr=1e-3), writes model to PVC
- Inference deployment: 2 replicas load model from PVC, serve three prediction endpoints via FastAPI (JSON input, image upload, random test-set sample)
- LoadBalancer service exposes inference pods externally on port 80
- Deployed on Google Kubernetes Engine (GKE) with images stored in Google Artifact Registry
