A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs (PPoPP 2024 - Main Conference)

Who

Pang Meng, Xiang Fei, Peng Qu, Youhui Zhang, Zhaolin Li

Track

PPoPP 2024 Main Conference

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Wed 6 Mar 2024 10:00 - 10:20 at Moorfoot - Linear Algebra Chair(s): I-Ting Angelina Lee

Abstract

Sparse-Matrix Dense-Matrix Multiplication (SpMM) and Sampled Dense Dense Matrix Multiplication (SDDMM) are important sparse kernels in various computation domains. The uneven distribution of non-zeros in the sparse matrix and the tight data dependence between sparse and dense matrixes make it a challenge to run sparse matrix multiplication efficiently on GPUs. By analyzing the aforementioned problems, we propose a row decomposition (RoDe)-based approach to optimize the two kernels on GPUs, using the standard Compressed Sparse Row (CSR) format. Specifically, RoDe divides the sparse matrix rows into regular parts and residual parts, to fully optimize their computations separately. We also devise the corresponding load balancing and fine-grained pipelining technologies. Profiling results show that RoDe can achieve more efficient memory access and reduce warp stall cycles significantly. Compared to the state-of-the-art (SOTA) alternatives, RoDe achieves a speedup of up to 7.86x with a geometric mean of 1.45x for SpMM, and a speedup of up to 8.99x with a geometric mean of 1.49x for SDDMM; the dataset is SuiteSparse. RoDe also outperforms its counterpart in the deep learning dataset. Furthermore, its preprocessing overhead is significantly smaller, averaging only 16% of the SOTA.

Link to Publication

https://dl.acm.org/doi/pdf/10.1145/3627535.3638470

DOI

https://doi.org/10.1145/3627535.3638470

Pang Meng

Department of Computer Science and Technology, Tsinghua University

Xiang Fei

Department of Computer Science and Technology, Tsinghua University

Peng Qu

Department of Computer Science and Technology, Tsinghua University

Youhui Zhang

Department of Computer Science and Technology, Tsinghua University

Zhaolin Li

Department of Computer Science and Technology, Tsinghua University

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Wed 6 Mar
Displayed time zone: London change

10:00 - 11:00	Linear AlgebraMain Conference at Moorfoot Chair(s): I-Ting Angelina Lee Washington University in St. Louis, USA

10:00 20m Talk		A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs Main Conference Pang Meng Department of Computer Science and Technology, Tsinghua University, Xiang Fei Department of Computer Science and Technology, Tsinghua University, Peng Qu Department of Computer Science and Technology, Tsinghua University, Youhui Zhang Department of Computer Science and Technology, Tsinghua University, Zhaolin Li Department of Computer Science and Technology, Tsinghua University Link to publication DOI
10:20 20m Talk		Fast Kronecker Matrix-Matrix Multiplications on GPUs Main Conference Abhinav Jangda Microsoft Research, Mohit Yadav University of Massachusetts Amherst Link to publication DOI
10:40 20m Talk		Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix Multiplication Main Conference Lukas Gianinazzi ETH Zurich, Alexandros Nikolaos Ziogas ETH Zurich, Piotr Luczynski ETH Zurich, Langwen Huang ETH Zurich, Saleh Ashkboosh ETH Zurich, Florian Scheidl ETH Zurich, Armon Carigiet ETH Zurich, Chio Ge ETH Zurich, Nabil Abubaker ETH Zurich, Maciej Besta ETH Zurich, Tal Ben-Nun Lawrence Livermore National Laboratory, Torsten Hoefler ETH Zurich Link to publication DOI