1

MAST: Model-Agnostic Sparsified Training

We introduce a novel optimization formulation incorporating pre-trained models and random sketch operators, enabling sparsification-aware training with tighter convergence rates.

Towards a Better Theoretical Understanding of Independent Subnetwork Training

We provide a theoretical analysis of Independent Subnetwork Training (IST), identifying fundamental differences from alternative approaches and analyzing its optimization performance.

Shifted Compression Framework: Generalizations and Improvements

We develop a unified framework for studying distributed optimization methods with compression.

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

We propose ADOM -- an accelerated method for smooth and strongly convex decentralized optimization over time-varying networks.

Revisiting Stochastic Extragradient

We fix a fundamental issue in the stochastic extragradient method by providing a new sampling strategy.

SGD: General Analysis and Improved Rates

We propose a general yet simple theorem describing the convergence of SGD under the arbitrary sampling paradigm.