AI as a Service – Containerized ML Pipeline on Kubernetes

Timeframe: Spring 2025
Stack: PyTorch · FastAPI · Docker · Kubernetes (GKE) · Google Artifact Registry · Python

Overview

Built a complete cloud-native ML pipeline: CNN trained on MNIST as a Kubernetes batch job, model served through a scalable REST API.

Architecture

  • Shared PVC (5 Gi) to pass trained model artifacts from training job to inference pods
  • Training job: downloads MNIST, trains CNN for 5 epochs (Adam, lr=1e-3), writes model to PVC
  • Inference deployment: 2 replicas load model from PVC, serve three prediction endpoints via FastAPI (JSON input, image upload, random test-set sample)
  • LoadBalancer service exposes inference pods externally on port 80
  • Deployed on Google Kubernetes Engine (GKE) with images stored in Google Artifact Registry