RepoRecSys – GitHub Repository Recommendation System

Timeframe: Sep 2025 – Dec 2025
Stack: PyTorch · BERT · FastAPI · React 18 · TypeScript · Vite · Docker · Google Cloud Storage · Pub/Sub

Overview

Designed a two-tower (dual-encoder) neural recommender for GitHub repositories. User tower maps user IDs to 64d dense embeddings; item tower fuses repository ID, language, and 25 numeric features into 64d item embeddings. Relevance scored via dot-product similarity, trained with contrastive InfoNCE loss.

Architecture

  • Training: 3 epochs, batch size 2048, with negative sampling
  • Evaluation: Precision@10, Recall@10, NDCG@10
  • Cold-start handling: new users published to Pub/Sub; global popular repos returned as fallback until starred-repo data is ingested
  • Data ingestion pipeline: listens to Pub/Sub → fetches starred repos from GitHub API → augments with metadata → saves to GCS
  • Hot model reloading: GCS polled every 10 minutes for updated artifacts; in-memory model swapped without restart
  • Admin dashboard: React frontend to trigger ingest, train, reload, and view system status

Deployment

Containerized FastAPI inference service behind a REST endpoint, returning top-K recommendations in <100 ms median latency for typical users.