Our paper titled “Scaling Beyond the GPU Memory Limit for Large Mixture-of-Experts Model Training” was accepted to ICML 2024. Congratulations Yechan, Hwijoon!