Engineering leader with 15+ years building large-scale distributed systems, now leading teams at the frontier of agentic AI infrastructure. Shipped foundational platforms at Meta, including Agent Memory, AgentBus, AgentTune, and RL-as-a-Service, and at Microsoft across OneDrive authorization, caching, DLP, and abuse throttling. Track record of taking 0-to-1 systems to production scale and driving measurable business impact across latency, capacity, reliability, and engagement.
Core Skills
- Domains: Agentic AI infrastructure, reinforcement learning systems, LLM fine-tuning and serving, distributed systems, control planes, P2P distribution, authorization, security
- Leadership: Multi-team management, cross-org strategy, XFN partnership, hiring, org design, technical vision
- Stack: C#, Python, C++, PyTorch, vLLM, MAST, KV stores, vector search, shared-log systems
Experience
Engineering Manager, Meta Superintelligence Labs - RL and Agentic Infra, Meta 2025 - Present
Lead roughly 14 engineers building production infrastructure for Meta's AI agents across memory, multi-agent execution, RL, and fine-tuning/serving workflows.
- Shipped Agent Memory, a persistent namespace-scoped memory platform backed by Meta's KV Store, Vector Search, and Blob Storage; onboarded 2 agent frameworks with millions of memory objects stored across shared agents.
- Shipped AgentBus, a shared-log execution substrate for multi-agent systems with pluggable safety voters, replay-based fault tolerance, and full audit trails; delivered the core security invariant and 99.99% reliability for onboarded frameworks.
- Delivered AgentTune and RL-as-a-Service, enabling trajectory-based agent learning, SFT/LoRA fine-tuning, checkpoint registration, batch inference sweeps, one-click vLLM deployments, tiered rate limiting, and observability.
- Built operating cadence for a 0-to-1 org by clarifying roadmaps, staffing multiple workstreams, partnering with research/product teams, and converting frontier prototypes into production-grade infrastructure.
Engineering Manager, AI Infra - Training Control Plane, Meta 2024 - 2025
Led 23+ ICs and 2 managers delivering Meta's AI Training Control Plane across product groups and infrastructure teams, connecting training pipelines, model registration, serving readiness, and fleet capacity management.
- Drove a 99.3% reduction in Training-to-Serving latency, producing a 7-10% lift in cold-start engagement on downstream ranking surfaces.
- Reclaimed 1.6 MW of capacity and delivered 20% resource reduction across training fleets.
- Unified Model Freshness strategy across Data, Inference, and Training organizations, aligning metrics, SLAs, and ownership boundaries for end-to-end model delivery.
- Led cross-org planning and execution for emerging business requirements, translating product urgency into infrastructure milestones, launch sequencing, and measurable operational outcomes.
Engineering Manager, Core Systems - Distribution Infrastructure, Meta 2021 - 2024
Owned Tier-0 distribution services powering mission-critical fleet-wide functionality across Meta's production fleet.
- Led Falcon, a globally distributed control-plane service for config and service discovery with massive fanout, low latency, and Tier-0 reliability.
- Led Owl, a P2P distribution system for TB-scale objects, including AI/ML models and Ads data, across Meta's private cloud.
- Drove reliability, capacity, and operational planning for systems that sit on the startup path of critical services.
- Partnered with XFNs and senior leadership to evolve core infrastructure consumed by virtually every service at Meta.
Principal Engineering Manager, OneDrive and SharePoint, Microsoft 2015 - 2021
Led 14 engineers across abuse throttling, caching, authorization, purchase platform, and Photos experiences.
- Designed and shipped a unified Cache Framework spanning local cache, cluster cache, and dual-cache implementations, adopted across OneDrive.
- Delivered the initial Data Loss Prevention implementation for OneDrive business customers, enforcing document access through compliance policies.
- Rebuilt the OneDrive abuse throttling subsystem, adding filtering by application, usage, and configurable limits.
- Designed and shipped a unified Authorization Framework for granular runtime resource access checks across OneDrive components.
Software Development Engineer, Windows Services, Microsoft 2011 - 2015
Built the primary contact data store powering Skype, Hotmail/Outlook.com, and Windows clients, serving hundreds of thousands of requests per second over EAS, REST, and SOAP.
- Owned all REST APIs for reading and writing contact data.
- Shipped Contact Sync, EAS extensions, Skype Push APIs, and the back-compatibility model.
- Owned Sandbox, a library for standardized third-party contact and activity integrations consumed by multiple platforms.
Associate Software Engineer, Nokia India Pvt. Ltd. 2008 - 2009
- Improved media playback performance on high-resolution devices and contributed to Touch Keypad and OVI Media Player development.
- First-ever recipient of Nokia's Water Tight Quality Champion award; also received an Outstanding Contribution Award.
Software Engineering Intern, IBM Software Labs 2007
- Built internal PHP/MySQL and WebSphere tools for employee surveys and server IP allocation tracking.
Education
Georgia Institute of Technology, M.S. Computer Science 2009 - 2011
- GPA: 3.66
- Coursework: Advanced Operating Systems, Algorithms, Computer Networks, AI, HCI, Mobile Applications and Services
M. S. Ramaiah Institute of Technology, B.E. Computer Science and Engineering Bangalore, India
- Graduated First Class with Distinction, 73.0% aggregate, top 10% of class
Research and Publications
Georgia Tech CERCS Research Group Faculty: Ada Gavrilovska
- Built a QoS-aware scheduler for the integrated cryptographic accelerator on Intel's EP80579 SoC.
- Co-author, "A Split-Driver Approach to SoC Virtualization - Challenges and Opportunities," 5th International Symposium on Embedded Multicore SoCs (MCSoC-10).