POSTER - FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences (PPoPP 2024 - Main Conference)

Who

Lixian Ma, Haoruo Chen, En Shao, Leping Wang, Quan Chen, Guangming Tan

Track

PPoPP 2024 Main Conference

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Sun 3 Mar 2024 18:00 - 20:00 at Strathblane Hall - Reception and Poster Session

Abstract

GPUs have been becoming an indispensable computing platform in deep learning. Co-locating multiple DNN servings to share GPU resource is widely used to improve resource utilization while guaranteeing user QoS. However, despite recent efforts toward designing multi-DNN co-running, existing solutions still encounter difficulties in effectively handling fluctuating and heterogeneous requests. This is because the resource management mechanisms they employ to co-running multiple DNN requests lack reconfigurability, co-location performance isolation, and QoS-awareness.

In this paper, we propose a novel resource management paradigm called dynamic spatial multikernel that enables fine-grained management of GPU resources at kernel-wise, specifically for concurrent DNN inferences. We design a multi-DNN serving system, FineCo, support this resource management without relying on hardware or operating system assistance. FineCo enables resource allocation in kernel-wise by giving DNN compiler the ability to generate multiple candidates implementations that have different resource demand sensitivities for DNN kernels. Base on multi-candidate implementations of kernels, FineCo utilizes two-level scheduler to dynamically coordinate the scheduling of concurrent requests and the candidate scheduling of among co-located kernels for fully tapping the available GPU resources and avoiding resource conflict. Our prototype implementation demonstrates that FineCo achieves up to 40% throughput improvement and reduces real-time latency by 19% over the state-of-the-art work.

Lixian Ma

State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing

Haoruo Chen

State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing

China

En Shao

State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing

Leping Wang

State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing

Quan Chen

Shanghai Jiao Tong University

Guangming Tan

Chinese Academy of Sciences(CAS)

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Sun 3 Mar
Displayed time zone: London change

18:00 - 20:00	Reception and Poster SessionMain Conference at Strathblane Hall

18:00 2h Poster		POSTER - H3: A Hash-table Based and Holistically Optimized High-Performance Sparse Tensor Contraction Main Conference Guofeng Feng Institute of Computing Technology, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Weile Jia Institute of Computing Technology, Chinese Academy of Sciences, Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Jiajia Li North Carolina State University
18:00 2h Poster		POSTER - P2Res: Pattern-Aware Sparse Communication for Scalable Recommendation Model Training Main Conference Jiaao He Tsinghua University, China, Shengqi Chen Tsinghua University, Jidong Zhai Tsinghua University
18:00 2h Poster		POSTER - gZCCL: Compression-Accelerated Collective Communication Framework for GPU Clusters Main Conference Jiajun Huang University of California, Riverside, Sheng Di Argonne National Laboratory, Xiaodong Yu Stevens Institute of Technology, Yujia Zhai University of California Riverside, Jinyang Liu University of California, Riverside, Yafan Huang The University of Iowa, Ken Raffenetti Argonne National Laboratory, Hui Zhou Argonne National Laboratory, Kai Zhao Florida State University, zizhong chen University of California, Riverside, Franck Cappello Argonne National Laboratory, Yanfei Guo Argonne National Laboratory, Rajeev Thakur Argonne National Laboratory
18:00 2h Poster		POSTER - RadiK: Scalable Radix Top-K Selection on GPUs Main Conference Yifei Li Alibaba Group, Bole Zhou Independent, Jiejing Zhang Alibaba Group, Xuechao Wei Alibaba Group, Yinghan Li Alibaba Group, Yingda Chen Alibaba Group
18:00 2h Poster		POSTER - Accelerating High-Precision Integer Multiplication used in Cryptosystems with GPUs Main Conference Zhuoran Ji Shandong University, Zhaorui Zhang The Hong Kong Polytechnic University, Jiming Xu Ant Group, Lei Ju Shandong University
18:00 2h Poster		POSTER - Enabling Extreme-Scale Phase Field Simulation with In-situ Feature Extraction Main Conference Zhichen Feng Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Jialin Li Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Yaqian Gao Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, TianShaobo Computer Network Information Center, Chinese Academy of Sciences; University of Chinese Academy of Sciences, Huang Ye Computer Network Information Center, Chinese Academy of Sciences, Jian Zhang Computer Network Information Center, Chinese Academy of Sciences
18:00 2h Poster		POSTER - ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU Clusters Main Conference lishunde Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences, Junyu Gu Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences, Tiechui Yao Computer Network Information Center, Chinese Academy of Sciences, ZhiQiang Liang Computer Network Information Center, Chinese Academy of Sciences, Yumeng Shi Computer Network Information Center, Chinese Academy of Sciences, Shigang Li Beijing University of Posts and Telecommunications, Weiting Xi North China Electric Power University, Shushen Li North China Electric Power University, Chunbao Zhou Computer Network Information Center, Chinese Academy of Sciences, Yangang Wang Computer Network Information Center, Chinese Academy of Sciences, Xuebin Chi Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences
18:00 2h Poster		POSTER - RELAX: Durable Data Structures with Swift Recovery Main Conference Almog Zur Technion, Nachshon Cohen Amazon, Michal Friedman ETH Zurich, Switzerland, Erez Petrank Technion
18:00 2h Poster		POSTER - FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN Inferences Main Conference Lixian Ma State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Haoruo Chen State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, En Shao State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Leping Wang State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, Quan Chen Shanghai Jiao Tong University, Guangming Tan Chinese Academy of Sciences(CAS)
18:00 2h Poster		POSTER - LLM-PQ:Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization Main Conference Juntao Zhao The University of Hong Kong, Borui Wan The University of Hong Kong, Chuan Wu The University of Hong Kong, Yanghua Peng ByteDance Inc., Haibin Lin ByteDance Inc.
18:00 2h Poster		POSTER - StructMG: A Fast and Scalable Structured Multigrid Main Conference Yi Zong Tsinghua University, Xinliang Wang Huawei Technologies Co., Ltd, Haopeng Huang Tsinghua University, Chensong Zhang Academy of Mathematics and Systems Science, Xiaowen Xu Institute of Applied Physics and Computational Mathematics, Jian Sun CMA Earth System Modeling and Prediction Center, Bowen Yan Tsinghua University, Qin Wang Huawei Technologies Co., Ltd, Sicong Li Huawei Technologies Co., Ltd, Zhaohui Ding Huawei Technologies Co., Ltd, Wei Xue Tsinghua University
18:00 2h Poster		POSTER - OCToPus: Semantic-aware Concurrency Control for Blockchain Transactions Main Conference dePaul Miller Lehigh University, Henry F. Korth Lehigh University, Roberto Palmieri Lehigh University