Muon Is Scalable for LLM Training

by renonceon 2/25/2025, 4:50 AMwith 1 comments

by yorwbaon 2/25/2025, 5:40 AM

For people who want to know more about the Muon optimizer: https://kellerjordan.github.io/posts/muon/