Publications
Efficient and Adaptive Generative Model Inference
PathFinder: Efficiently Supporting Conjunctions and Disjunctions for Filtered Approximate Nearest Neighbor Search
Tianming Wu, Dixin Tang
Preprint, November 2025
On-the-Fly Adaptive Distillation of Transformer to Dual-State Linear Attention for Long-Context LLM Serving
Yeonju Ro, Zhenyu Zhang, Souvik Kundu, Zhangyang Wang, and Aditya Akella
ICML 2025
StitchLLM: Serving LLMs, One Block at a Time
Bodun Hu, Shuozhe Li, Saurabh Agarwal, Myungjin Lee, Akshay Jajoo, Jiamin Li, Le Xu, Geon-Woo Kim, Donghyun Kim, Hong Xu, Amy Zhang, Aditya Akella
ACL 2025
Read-ME: Refactorizing LLMs as Router-Decoupled Mixture of Experts with System Co-Design
Ruisi Cai, Yeonju Ro, Geon-Woo Kim, Peihao Wang, Babak Ehteshami Bejnordi, Aditya Akella, and Zhangyang Wang
NeurIPS 2024
Taming Throughput-Latency Tradeoff in LLM Inference with Sarathi-Serve
Amey Agrawal, Nitin Kedia, Ashish Panwar, Jayashree Mohan, Nipun Kwatra, Bhargav Gulavani, Alexey Tumanov, and Ramachandran Ramjee
OSDI 2024
Vidur: A Large-Scale Simulation Framework For LLM Inference
Amey Agrawal, Nitin Kedia, Jayashree Mohan, Ashish Panwar, Nipun Kwatra, Bhargav Gulavani, Ramachandran Ramjee, and Alexey Tumanov
MLSys 2024
MOSEL: Inference Serving Using Dynamic Modality Selection
Bodun Hu, Le Xu, Jeongyoon Moon, Neeraja Yadwadkar, and Aditya Akella
EMNLP 2024
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping
Ajay Kumar Jaiswal, Bodun Hu, Lu Yin, Yeonju Ro, Tianlong Chen, Shiwei Liu, and Aditya Akella
EMNLP 2024
TrainMover: An Interruption-Resilient and Reliable ML Training Runtime
ChonLam Lao, Minlan Yu, Aditya Akella, Jiamin Cao, Yu Guan, Pengcheng Zhang, Zhilong Zheng, Yichi Xu, Ennan Zhai, Dennis Cai, and Jiaqi Gao
Preprint
Compound AI System Inference
Patchwork: A Unified Framework for RAG Serving
Bodun Hu, Luis Pabon, Saurabh Agarwal, and Aditya Akella
Preprint
SYMPHONY: Improving Memory Management for LLM Inference Workloads
Saurabh Agarwal, Anyong Mao, Aditya Akella, and Shivaram Venkataraman
Preprint
Large-Scale Foundation Model Training
HALoS: Hierarchical Asynchronous Local SGD over Slow Networks for Geo-Distributed Large Language Model Training
Geon-Woo Kim, Junbo Li, Shashidhar Gandham, Omar Baldonado, Adithya Gangidi, Pavan Balaji, Zhangyang Wang, and Aditya Akella
ICML 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang, Haotian Hu, Zhenyu Zhang, Gaojie Jin, Xiang Li, Li Shen, Tianlong Chen, Lu Liu, Qingsong Wen, Zhangyang Wang, and Shiwei Liu
Preprint
Enhance-A-Video: Better Generated Video for Free
Yang Luo, Xuanlei Zhao, Mengzhao Chen, Kaipeng Zhang, Wenqi Shao, Kai Wang, Zhangyang Wang, and Yang You
Preprint
Machine Learning Inference
CuSfM: CUDA-Accelerated Structure-from-Motion
Jingrui Yu, Jun Liu, Kefei Ren, Joydeep Biswas, Rurui Ye, Keqiang Wu, Chirag Majithia, and Di Zeng
Preprint
cuVSLAM: CUDA accelerated visual odometry
Alexander Korovko, Dmitry Slepichev, Alexander Efitorov, Aigul Dzhumamuratova, Viktor Kuznetsov, Hesam Rabeti, Joydeep Biswas, and Soha Pouya
Preprint
ConfigBot: Adaptive Resource Allocation for Robot Applications in Dynamic Environments
Rohit Dwivedula, Sadanand Modak, Aditya Akella, Joydeep Biswas, Daehyeok Kim, and Christopher J. Rossbach
OMEGA: A Low-Latency GNN Serving System for Large Graphs
Geon-Woo Kim, Donghyun Kim, Jeongyoon Moon, Henry Liu, Tarannum Khan, Anand Iyer, Daehyeok Kim, and Aditya Akella
Preprint
Generative AI for Systems
Man-Made Heuristics Are Dead. Long Live Code Generators!
Rohit Dwivedula, Divyanshu Saxena, Aditya Akella, Swarat Chaudhuri, and Daehyeok Kim
HotNets 2025
How I learned to stop worrying and love learned OS policies
Divyanshu Saxena, Jiayi Chen, Sujay Yadalam, Yeonju Ro, Rohit Dwivedula, Eric Campbell, Aditya Akella, Christopher Rossbach, and Michael Swift
HotOS 2025
Large Language Models as Realistic Microservice Trace Generators
Donghyun Kim, Sriram Ravula, Taemin Ha, Alexandros G. Dimakis, Daehyeok Kim, and Aditya Akella
EMNLP 2025.
On a foundation model for operating systems
Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, P. Brighten Godfrey, Daehyeok Kim, Chris Rossbach, and Gang Wang
Machine Learning for Systems Workshop at 37th NeurIPS Conference, 2023
Speculative Ad-hoc Querying
Haoyu Li, Srikanth Kandula, Maria Angels de Luis Balaguer, Aditya Akella, and Venkat Arun
Preprint