Conference Publications

  1. MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators. [pdf]
    Beichen Huang*, Yueming Yuan*, Zelei Shao*, Minjia Zhang (*Equal contribution).
    MLSys ‘2025.


  2. SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention. [pdf]
    Ahan Gupta, Yueming Yuan, Devansh Jain, Yuhao Ge, David Aponte, Yanqi Zhou, Charith Mendis.
    OOPSLA ‘2025.