We provide analysis of the inexact orthogonalized update at Muon's core, revealing a fundamental coupling between LMO inexactness and optimal step size and momentum.
We propose Gluon, a new LMO-based optimizer with a refined generalized smoothness model that captures layer-wise geometry and closes the gap between theory and practice.
We design α-NormEC, the first differentially private distributed optimization algorithm with provable convergence guarantees for smooth, non-convex problems.
We provide the first comprehensive convergence analysis of SGD with quantile clipping, establishing theoretical guarantees for DP-QC-SGD.
In order to mitigate the high communication cost in distributed and federated learning, various vector compression schemes, such as quantization, sparsification and dithering, have become very popular. In designing a compression method, one aims to …
We present a generic framework that allows accelerating almost arbitrary non-accelerated algorithms for smooth convex optimization problems.