Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU
Convolutional neural networks (CNNs) have achieved remarkable success in various application fields. Although model compression techniques mitigate the ever-increasing resource demands of large CNN models, the compressed models usually exhibit irregular memory access and unstructured sparsity, which are difficult for dominant operators such as sparse convolution to achieve expected performance speedup on popular inference platforms such as GPU. In this paper, we propose Tetris, an efficient sparse convolution approach optimized for GPU. Tetris first fully exploits the input reuse opportunity of sparse convolution to reduce the memory accesses to global memory. It then adopts a stride packed filter (SPF) format and a bank-sensing reorganization scheme to eliminate the irregular memory accesses caused by unstructured sparsity. It also leverages a filter group reorder technique to address load imbalance among threads, and a parameter tuning method to determine the optimal parameters of the sparse convolution implementation. The experiment results show that Tetris outperforms dense/sparse convolution libraries and cutting-edge implementations with promising performance speedup.
Tue 5 MarDisplayed time zone: London change
| 11:30 - 12:30 | |||
| 11:3020m Talk | Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU Main Conference xiaoyanliu  Beihang University, Xuegui Zheng Beihang University, Hailong Yang Beihang University, China, Zhongzhi Luan Beihang University, Depei Qian Beihang University, ChinaLink to publication DOI | ||
| 11:5020m Talk | Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips Main ConferenceLink to publication DOI | ||
| 12:1020m Talk | Training one DeePMD Model in Minutes: a Step Towards Online Learning Main Conference Siyu Hu Institute of Computing Technology, Chinese Academy of Sciences, Tong Zhao Institute of Computing Technology, Chinese Academy of Sciences, Qiuchen Sha Institute of Computing Technology, Chinese Academy of Sciences, Enji Li Institute of Computing Technology, Chinese Academy of Sciences, Xiangyu Meng College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Liping Liu Institute of Semiconductors, Chinese Academy of Sciences, Lin-Wang Wang Institute of Semiconductors, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Weile Jia Institute of Computing Technology, Chinese Academy of SciencesLink to publication DOI | ||