RepoRecSys – GitHub Repository Recommendation System
Timeframe: Sep 2025 – Dec 2025
Stack: PyTorch · BERT · FastAPI · React 18 · TypeScript · Vite · Docker · Google Cloud Storage · Pub/Sub
Overview
Designed a two-tower (dual-encoder) neural recommender for GitHub repositories. User tower maps user IDs to 64d dense embeddings; item tower fuses repository ID, language, and 25 numeric features into 64d item embeddings. Relevance scored via dot-product similarity, trained with contrastive InfoNCE loss.
Architecture
- Training: 3 epochs, batch size 2048, with negative sampling
- Evaluation: Precision@10, Recall@10, NDCG@10
- Cold-start handling: new users published to Pub/Sub; global popular repos returned as fallback until starred-repo data is ingested
- Data ingestion pipeline: listens to Pub/Sub → fetches starred repos from GitHub API → augments with metadata → saves to GCS
- Hot model reloading: GCS polled every 10 minutes for updated artifacts; in-memory model swapped without restart
- Admin dashboard: React frontend to trigger ingest, train, reload, and view system status
Deployment
Containerized FastAPI inference service behind a REST endpoint, returning top-K recommendations in <100 ms median latency for typical users.
