3

Beyond the Ideal: Analyzing the Inexact Muon Update

We provide analysis of the inexact orthogonalized update at Muon's core, revealing a fundamental coupling between LMO inexactness and optimal step size and momentum.

Gluon: Making Muon & Scion Great Again! (Bridging Theory and Practice of LMO-based Optimizers for LLMs)

We propose Gluon, a new LMO-based optimizer with a refined generalized smoothness model that captures layer-wise geometry and closes the gap between theory and practice.

Smoothed Normalization for Efficient Distributed Private Optimization

We design α-NormEC, the first differentially private distributed optimization algorithm with provable convergence guarantees for smooth, non-convex problems.

On the Convergence of DP-SGD with Adaptive Clipping

We provide the first comprehensive convergence analysis of SGD with quantile clipping, establishing theoretical guarantees for DP-QC-SGD.

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

In order to mitigate the high communication cost in distributed and federated learning, various vector compression schemes, such as quantization, sparsification and dithering, have become very popular. In designing a compression method, one aims to …

Adaptive Catalyst for Smooth Convex Optimization

We present a generic framework that allows accelerating almost arbitrary non-accelerated algorithms for smooth convex optimization problems.