Publications

Efficient and Adaptive Generative Model Inference

PathFinder: Efficiently Supporting Conjunctions and Disjunctions for Filtered Approximate Nearest Neighbor Search
Tianming Wu, Dixin Tang
Preprint, November 2025

On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention for Long-Context LLM Serving
Yeonju Ro, Zhenyu Zhang, Souvik Kundu, Zhangyang Wang, and Aditya Akella
ICML 2025

StitchLLM: Serving LLMs, One Block at a Time
Bodun Hu, Shuozhe Li, Saurabh Agarwal, Myungjin Lee, Akshay Jajoo, Jiamin Li, Le Xu, Geon-Woo Kim, Donghyun Kim, Hong Xu, Amy Zhang, Aditya Akella
ACL 2025

Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, and Zhangyang Wang
NeurIPS 2024

Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav Gulavani, Alexey Tumanov, and Ramachandran Ramjee
OSDI 2024

Vidur: A Large-Scale Simulation Framework For LLM Inference
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, and Alexey Tumanov
MLSys 2024

MOSEL: Inference Serving Using Dynamic Modality Selection
Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja Yadwadkar, and Aditya Akella
EMNLP 2024

FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
Ajay Kumar Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Tianlong Chen, Shiwei Liu, and Aditya Akella
EMNLP 2024

TrainMover: An Interruption-Resilient and Reliable ML Training Runtime
ChonLam Lao, Minlan Yu, Aditya Akella, Jiamin Cao, Yu Guan, Pengcheng Zhang, Zhilong Zheng, Yichi Xu, Ennan Zhai, Dennis Cai, and Jiaqi Gao
Preprint


Compound AI System Inference

Patchwork: A Unified Framework for RAG Serving
Bodun Hu, Luis Pabon, Saurabh Agarwal, and Aditya Akella
Preprint

SYMPHONY: Improving Memory Management for LLM Inference Workloads
Saurabh Agarwal, Anyong Mao, Aditya Akella, and Shivaram Venkataraman
Preprint


Large-Scale Foundation Model Training

HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training
Geon-Woo Kim, Junbo Li, Shashidhar Gandham, Omar Baldonado, Adithya Gangidi, Pavan Balaji, Zhangyang Wang, and Aditya Akella
ICML 2025

Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Tianlong Chen, Lu Liu, Qingsong Wen, Zhangyang Wang, and Shiwei Liu
Preprint

Enhance-A-Video: Better Generated Video for Free
Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, and Yang You
Preprint


Machine Learning Inference 

CuSfM: CUDA-Accelerated Structure-from-Motion
Jingrui Yu, Jun Liu, Kefei Ren, Joydeep Biswas, Rurui Ye, Keqiang Wu, Chirag Majithia, and Di Zeng
Preprint

cuVSLAM: CUDA accelerated visual odometry
Alexander Korovko, Dmitry Slepichev, Alexander Efitorov, Aigul Dzhumamuratova, Viktor Kuznetsov, Hesam Rabeti, Joydeep Biswas, and Soha Pouya
Preprint

ConfigBot: Adaptive Resource Allocation for Robot Applications in Dynamic Environments
Rohit Dwivedula, Sadanand Modak, Aditya Akella, Joydeep Biswas, Daehyeok Kim, and Christopher J. Rossbach

OMEGA: A Low-Latency GNN Serving System for Large Graphs
Geon-Woo Kim, Donghyun Kim, Jeongyoon Moon, Henry Liu, Tarannum Khan, Anand Iyer, Daehyeok Kim, and Aditya Akella
Preprint


Generative AI for Systems

Man-Made Heuristics Are Dead. Long Live Code Generators!
Rohit Dwivedula, Divyanshu Saxena, Aditya Akella, Swarat Chaudhuri, and Daehyeok Kim
HotNets 2025

How I learned to stop worrying and love learned OS policies
Divyanshu Saxena, Jiayi Chen, Sujay Yadalam, Yeonju Ro, Rohit Dwivedula, Eric Campbell, Aditya Akella, Christopher Rossbach, and Michael Swift
HotOS 2025

Large Language Models as Realistic Microservice Trace Generators
Donghyun Kim, Sriram Ravula, Taemin Ha, Alexandros G. Dimakis, Daehyeok Kim, and Aditya Akella
EMNLP 2025.

On a foundation model for operating systems
Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, and Gang Wang
Machine Learning for Systems Workshop at 37th NeurIPS Conference, 2023

Speculative Ad-hoc Querying
Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, and Venkat Arun
Preprint