TSPipe overview design.

Summary

The teacher-student (TS) framework, training a (student) network by utilizing an auxiliary superior (teacher) network, has been adopted as a popular training paradigm in many machine learning schemes, since the seminal work—Knowledge distillation (KD) for model compression and transfer learning. Many recent self-supervised learning (SSL) schemes also adopt the TS framework, where teacher networks are maintained as the moving average of student networks, called the momentum networks. This paper presents TSPipe, a pipelined approach to accelerate the training process of any TS frameworks including KD and SSL. Under the observation that the teacher network does not need a backward pass, our main idea is to schedule the computation of the teacher and student network separately, and fully utilize the GPU during training by interleaving the computations of the two networks and relaxing their dependencies. In case the teacher network requires a momentum update, we use delayed parameter updates only on the teacher network to attain high model accuracy. Compared to existing pipeline parallelism schemes, which sacrifice either training throughput or model accuracy, TSPipe provides better performance trade-offs, achieving up to 12.15x higher throughput.

Poster

Publications

ICML

TSPipe: Learn from Teacher Faster with Pipelines

Hwijoon Lim, Yechan Kim, Sukmin Yun, Jinwoo Shin, and Dongsu Han

In Proceedings of the 39th International Conference on Machine Learning Jul 2022

Paper PDF Project Code

Members

Hwijoon Lim

Alumni

Yechan Kim

Alumni

Dongsu Han

Principal Investigator