Conference Publications
- MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators. [pdf]
Beichen Huang*, Yueming Yuan*, Zelei Shao*, Minjia Zhang (*Equal contribution).
MLSys ‘2025.
- SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention. [pdf]
Ahan Gupta, Yueming Yuan, Devansh Jain, Yuhao Ge, David Aponte, Yanqi Zhou, Charith Mendis.
OOPSLA ‘2025.