Software Engineer Intern, AI/ML
Description
Web Augment — AI-Powered Tabular Data Enrichment: Built an end-to-end AI-driven data enrichment pipeline that augments tabular datasets with web-sourced information. Designed protobuf schemas and gRPC service endpoints, implemented the core table function and worker execution logic as Temporal workflow activities with batched submission, async polling, and row-level retry (up to 6 attempts). Created LLM-assisted parameter suggestion using BAML prompts with multi-model fallback.
Decoupling Embedding & Graph Parameters with Dynamic Dimension Resolution: Redesigned the DataApp API to decouple callers from internal embedding model details. Defined 6 new proto messages and 2 new enums, built 8 bidirectional conversion functions, and replaced hardcoded dimension maps with runtime model introspection via HuggingFace AutoConfig. Added dynamic resolution for GCP, OpenAI, CLIP, and SigLIP2 embedders.
MCP Tool Integration for AI Agent Platform: Designed and implemented a full-stack Model Context Protocol (MCP) integration enabling AI agents to discover, cache, and execute external tools on Snowflake MCP servers. Built abstract MCP client architecture, tool catalog domain model with change detection, Temporal workflow for catalog refresh, and 5 new gRPC endpoints.
Stack: Python 3.12+, gRPC, Protobuf, Temporal, SQLAlchemy, PostgreSQL, OpenAI API, BAML, Polars, PyArrow, Parallel AI, MCP/fastmcp v3, HuggingFace AutoConfig, pytest
