Peter Richtárik

I'm a Professor of Machine Learning at KAUST (King Abdullah University of Science and Technology), where I develop mathematical and algorithmic foundations of machine learning. My current research focuses on optimization methods for massive-scale machine learning problems and federated learning.

I work extensively on distributed optimization and machine learning algorithms, including compressed training, variance reduction, stochastic gradient descent, and federated learning approaches. Prior to KAUST, I held research positions at the University of Edinburgh, where I helped advance core algorithms like coordinate descent and developed communication-efficient methods for distributed training.

My work aims to bridge theory and practice in large-scale machine learning. I'm particularly interested in developing algorithms that can efficiently train machine learning models across distributed systems while minimizing communication overhead. Recently, I've focused significantly on federated learning, developing new techniques that enable efficient distributed training on edge devices.

Publications

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

Ionut-Vlad Modoranu, Mher Safaryan, Grigory Malinovsky, Eldar Kurtic, Thomas Robert, Peter Richtárik, Dan Alistarh

arXiv.org 2024

ABS HTML PDF

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

Vladimir Malinovskii, Denis Mazur, Ivan Ilin, Denis Kuznedelev, Konstantin Burlachenko, Kai Yi, Dan Alistarh, Peter Richtárik

arXiv.org 2024

ABS HTML PDF

Consensus-based optimisation with truncated noise

Massimo Fornasier, Peter Richtárik, Konstantin Riedl, Lukang Sun

European journal of applied mathematics 2024

Peter Richtárik

Publications

MicroAdam: Accurate Adaptive Optimization with Low Space Overhead and Provable Convergence

PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression

Consensus-based optimisation with truncated noise

Kimad: Adaptive Gradient Compression with Bandwidth Awareness

Federated Learning is Better with Non-Homomorphic Encryption

Understanding Progressive Training Through the Framework of Randomized Coordinate Descent

A Guide Through the Zoo of Biased SGD

Det-CGD: Compressed Gradient Descent with Matrix Stepsizes for Non-Convex Optimization

Optimal Time Complexities of Parallel Stochastic Optimization Methods Under a Fixed Computation Model

High-Probability Bounds for Stochastic Optimization and Variational Inequalities: the Case of Unbounded Variance

Convergence of First-Order Algorithms for Meta-Learning with Moreau Envelopes

A Damped Newton Method Achieves Global $O\left(\frac{1}{k^2}\right)$ and Local Quadratic Convergence Rate

Adaptive Compression for Communication-Efficient Distributed Training

Improved Stein Variational Gradient Descent with Importance Weights

Stochastic distributed learning with gradient quantization and double-variance reduction

Minibatch Stochastic Three Points Method for Unconstrained Smooth Minimization

Adaptive Learning Rates for Faster Stochastic Gradient Methods

RandProx: Primal-Dual Optimization Algorithms with Randomized Proximal Updates

Communication Acceleration of Local Gradient Methods via an Accelerated Primal-Dual Algorithm with Inexact Prox

A Note on the Convergence of Mirrored Stein Variational Gradient Descent under (L0, L1)-Smoothness Condition

Convergence of Stein Variational Gradient Descent under a Weaker Smoothness Condition

Federated Random Reshuffling with Compression and Variance Reduction

Optimal Algorithms for Decentralized Stochastic Variational Inequalities

Accelerated Primal-Dual Gradient Method for Smooth and Convex-Concave Saddle-Point Problems with Bilinear Coupling

Faster Rates for Compressed Federated Learning with Client-Variance Reduction

FL_PyTorch: optimization research simulator for federated learning

EF21 with Bells & Whistles: Practical Algorithmic Extensions of Modern Error Feedback

Distributed Methods with Compressed Communication for Solving Variational Inequalities, with Theoretical Guarantees

Permutation Compressors for Provably Faster Distributed Nonconvex Optimization

Error Compensated Loopless SVRG, Quartz, and SDCA for Distributed Optimization

Doubly Adaptive Scaled Algorithm for Machine Learning Using Second-Order Information

FedPAGE: A Fast Local Stochastic Gradient Method for Communication-Efficient Federated Learning

CANITA: Faster Rates for Distributed Convex Optimization with Communication Compression

A Field Guide to Federated Optimization

EF21: A New, Simpler, Theoretically Better, and Practically Faster Error Feedback

Lower Bounds and Optimal Algorithms for Smooth and Strongly Convex Decentralized Optimization Over Time-Varying Networks

A Convergence Theory for SVGD in the Population Limit under Talagrand's Inequality T1

MURANA: A Generic Framework for Stochastic Variance-Reduced Optimization

FedNL: Making Newton-Type Methods Applicable to Federated Learning

Random Reshuffling with Variance Reduction: New Analysis and Better Rates

ZeroSARAH: Efficient Nonconvex Finite-Sum Optimization with Zero Full Gradient Computation

Hyperparameter Transfer Learning with Adaptive Complexity

An Optimal Algorithm for Strongly Convex Minimization under Affine Constraints

AI-SARAH: Adaptive and Implicit Stochastic Recursive Gradient Methods

ADOM: Accelerated Decentralized Optimization Method for Time-Varying Networks

IntSGD: Adaptive Floatless Compression of Stochastic Gradients

MARINA: Faster Non-Convex Distributed Learning with Compression

Smoothness Matrices Beat Smoothness Constants: Better Communication Compression Techniques for Distributed Optimization

Distributed Second Order Methods with Fast Rates and Compressed Communication

Proximal and Federated Random Reshuffling

A Linearly Convergent Algorithm for Decentralized Optimization: Sending Less Bits for Free!

Local SGD: Unified Theory and New Efficient Methods

Optimal Client Sampling for Federated Learning

Linearly Converging Error Compensated SGD

Optimal Gradient Compression for Distributed and Federated Learning

Lower Bounds and Optimal Algorithms for Personalized Federated Learning

Distributed Proximal Splitting Algorithms with Rates and Acceleration

Variance-Reduced Methods for Machine Learning

Error Compensated Distributed SGD Can Be Accelerated

PAGE: A Simple and Optimal Probabilistic Gradient Estimator for Nonconvex Optimization

Acceleration for Compressed Gradient Descent in Distributed Optimization

Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization

Unified Analysis of Stochastic Gradient Methods for Composite Convex and Smooth Optimization

A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning

Primal Dual Interpretation of the Proximal Stochastic Gradient Langevin Algorithm

A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization

Random Reshuffling: Simple Analysis with Vast Improvements

Adaptive Learning of the Optimal Mini-Batch Size of SGD

On the Convergence Analysis of Asynchronous SGD for Solving Consistent Linear Systems

Dualize, Split, Randomize: Fast Nonsmooth Optimization Algorithms

From Local SGD to Local Fixed Point Methods for Federated Learning

Dualize, Split, Randomize: Toward Fast Nonsmooth Optimization Algorithms

On Biased Compression for Distributed Learning

Fast Linear Convergence of Randomized BFGS

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Stochastic Subspace Cubic Newton Method

Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor

Adaptivity of Stochastic Gradient Methods for Nonconvex Optimization