Training one DeePMD Model in Minutes: a Step Towards Online Learning (PPoPP 2024 - Main Conference)

Who

Siyu Hu, Tong Zhao, Qiuchen Sha, Enji Li, Xiangyu Meng, Liping Liu, Lin-Wang Wang, Guangming Tan, Weile Jia

Track

PPoPP 2024 Main Conference

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 5 Mar 2024 12:10 - 12:30 at Moorfoot - ML Workloads Chair(s): Xipeng Shen

Abstract

Neural Network Molecular Dynamics (NNMD) has become a major approach in material simulations, which can speed-up the molecular dynamics (MD) simulation for thousands of times, while maintaining \textit{ab initio} accuracy, thus has a potential to fundamentally change the paradigm of material simulations. However, there are two time-consuming bottlenecks of the NNMD developments. One is the data access of \textit{ab initio} calculation results. The other, which is the focus of the current work, is reducing the training time of NNMD model. The training of NNMD model is different from most other neural network training because the atomic force (which is related to the gradient of the network) is an important physical property to be fit. Tests show the traditional stochastic gradient methods, like the Adam algorithms, cannot efficiently deploy the multisample minibatch algorithm. As a result, a typical training (taking the Deep Potential Molecular Dynamics (DeePMD) as an example) can take many hours. In this work, we designed a heuristic minibatch quasi-Newtonian optimizer based on Extended Kalman Filter method. An early reduction of gradient and error is adopted to reduce memory footprint and communication. The memory footprint, communication and settings of hyper-parameters of this new method are analyzed in detail. Computational innovations such as customized kernels of the symmetry-preserving descriptor are applied to exploit the computing power of the heterogeneous architecture. Experiments are performed on 8 different datasets representing different real case situations, and numerical results show that our new method has an average speedup of 32.2 compared to the Reorganized Layer-wised Extended Kalman Filter with 1 GPU, reducing the absolute training time of one DeePMD model from hours to several minutes, making it one step toward online training.

Link to Publication

https://dl.acm.org/doi/pdf/10.1145/3627535.3638505

DOI

https://doi.org/10.5281/zenodo.10213773

Siyu Hu

Institute of Computing Technology, Chinese Academy of Sciences

Tong Zhao

Institute of Computing Technology, Chinese Academy of Sciences

Qiuchen Sha

Institute of Computing Technology, Chinese Academy of Sciences

Enji Li

Institute of Computing Technology, Chinese Academy of Sciences

Xiangyu Meng

College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum

Liping Liu

Institute of Semiconductors, Chinese Academy of Sciences

Lin-Wang Wang

Institute of Semiconductors, Chinese Academy of Sciences

Guangming Tan

Chinese Academy of Sciences(CAS)

Weile Jia

Institute of Computing Technology, Chinese Academy of Sciences

Time Zone

The program is currently displayed in (GMT) London.

Use conference time zone: (GMT) LondonSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 5 Mar
Displayed time zone: London change

11:30 - 12:30	ML WorkloadsMain Conference at Moorfoot Chair(s): Xipeng Shen North Carolina State University

11:30 20m Talk		Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPU Main Conference xiaoyanliu Beihang University, Xuegui Zheng Beihang University, Hailong Yang Beihang University, China, Zhongzhi Luan Beihang University, Depei Qian Beihang University, China Link to publication DOI
11:50 20m Talk		Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-Chips Main Conference Ismet Dagli Colorado School of Mines, Mehmet Belviranli Colorado School of Mines Link to publication DOI
12:10 20m Talk		Training one DeePMD Model in Minutes: a Step Towards Online Learning Main Conference Siyu Hu Institute of Computing Technology, Chinese Academy of Sciences, Tong Zhao Institute of Computing Technology, Chinese Academy of Sciences, Qiuchen Sha Institute of Computing Technology, Chinese Academy of Sciences, Enji Li Institute of Computing Technology, Chinese Academy of Sciences, Xiangyu Meng College of Computer Science and Technology, Qingdao Institute of Software, China University of Petroleum, Liping Liu Institute of Semiconductors, Chinese Academy of Sciences, Lin-Wang Wang Institute of Semiconductors, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Weile Jia Institute of Computing Technology, Chinese Academy of Sciences Link to publication DOI