Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU (PPoPP 2024 - Main Conference)

Who

xiaoyanliu , Xuegui Zheng, Hailong Yang, Zhongzhi Luan, Depei Qian

Track

PPoPP 2024 Main Conference

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 5 Mar 2024 11:30 - 11:50 at Moorfoot - ML Workloads Chair(s): Xipeng Shen

Abstract

Convolutional neural networks (CNNs) have achieved remarkable success in various application fields. Although model compression techniques mitigate the ever-increasing resource demands of large CNN models, the compressed models usually exhibit irregular memory access and unstructured sparsity, which are difficult for dominant operators such as sparse convolution to achieve expected performance speedup on popular inference platforms such as GPU. In this paper, we propose Tetris, an efficient sparse convolution approach optimized for GPU. Tetris first fully exploits the input reuse opportunity of sparse convolution to reduce the memory accesses to global memory. It then adopts a stride packed filter (SPF) format and a bank-sensing reorganization scheme to eliminate the irregular memory accesses caused by unstructured sparsity. It also leverages a filter group reorder technique to address load imbalance among threads, and a parameter tuning method to determine the optimal parameters of the sparse convolution implementation. The experiment results show that Tetris outperforms dense/sparse convolution libraries and cutting-edge implementations with promising performance speedup.

Link to Publication

https://dl.acm.org/doi/pdf/10.1145/3627535.3638471

DOI

https://doi.org/10.1145/3627535.3638471

xiaoyanliu

Beihang University

Xuegui Zheng

Beihang University

Hailong Yang

Beihang University, China

Zhongzhi Luan

Beihang University

Depei Qian

Beihang University, China

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 5 Mar
Displayed time zone: London change

11:30 - 12:30	ML WorkloadsMain Conference at Moorfoot Chair(s): Xipeng Shen North Carolina State University

11:30 20m Talk		Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU Main Conference xiaoyanliu Beihang University, Xuegui Zheng Beihang University, Hailong Yang Beihang University, China, Zhongzhi Luan Beihang University, Depei Qian Beihang University, China Link to publication DOI
11:50 20m Talk		Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips Main Conference Ismet Dagli Colorado School of Mines, Mehmet Belviranli Colorado School of Mines Link to publication DOI
12:10 20m Talk		Training one DeePMD Model in Minutes: a Step Towards Online Learning Main Conference Siyu Hu Institute of Computing Technology, Chinese Academy of Sciences, Tong Zhao Institute of Computing Technology, Chinese Academy of Sciences, Qiuchen Sha Institute of Computing Technology, Chinese Academy of Sciences, Enji Li Institute of Computing Technology, Chinese Academy of Sciences, Xiangyu Meng College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Liping Liu Institute of Semiconductors, Chinese Academy of Sciences, Lin-Wang Wang Institute of Semiconductors, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Weile Jia Institute of Computing Technology, Chinese Academy of Sciences Link to publication DOI