
Anvit
AI Systems Engineer for LLM RAG and Automation
Habilidades

Conheça meus serviços

Portfólio
Experiência profissional
ML & AI Engineer
Self Employed • Freelance
Sep 2024 - Present • 1 yr 10 mos
ML & AI Engineer specializing in GPU kernel engineering (Triton/CUDA), LLM inference optimization, and production ML systems at Plus91 Technology, Pune. Key work: - Custom Triton decode attention kernel: 2.5× end-to-end speedup on Phi-3 Mini, 28ms TTFT P50, 39.4 tok/s throughput - KV cache compression system (TurboQuant): 4.5× reduction at 100k context via Lloyd-Max quantization - Production LLM serving stack (FastAPI/Redis/PyTorch): 81% cache hit rate - PPO-based RL decision systems: +58% performance vs rule-based baselines - Triton GPU kernels: up to 14.65× speedup over PyTorch baseline