On the different regimes of stochastic gradient descent

Antonio Sclocchi; Matthieu Wyart

doi:10.1073/pnas.2316301121

What is it about?

AI learning process is based on algorithms. The Stochastic Gradient Descent (SGD) algorithm, which is one of the most popular choices, depends on two main hyperparameters: the amount of data examples shown to the model at each learning step, called "batch size", and the magnitude of the learning steps, called "learning rate". We have shown that the selection of learning rate and batch size identifies three different regimes in which the SGD algorithm operates. In the first regime, corresponding to small batch sizes and large learning rates, the learning process takes small, random steps. In this case, the process is noisy and allows the AI to explore solutions that it wouldn’t have found otherwise. In the second regime, corresponding to large learning rates and batch sizes, the process takes large initial steps which strongly affect the final solution. In the third regime, given by large batches and smaller learning rates, the learning process is more predictable and less prone to random exploration. According to specific application cases, each of these scenarios has different benefits and drawbacks in terms of training speeds and final performances of the AI model.

Photo by Growtika on Unsplash

Why is it important?

How the hyperparameters of training algorithms affect the AI learning process is not well understood. Therefore, their choice usually relies on expensive grid searches. Our work makes a significant step in solving this problem by identifying the distinct regimes in which the SGD algorithm operates. This result is important because state-of-the-art AI models are usually trained with the SGD algorithm or its variations. Therefore, its comprehension is a fundamental step in understanding the solutions found by deep networks and in choosing the hyperparamters in a principled way.

Perspectives

I enjoyed working on this project because it was an opportunity for me to use my background as a physicist to tackle open questions in AI. In fact, I think that important results are often obtained from the contamination of different disciplines and mindsets.
Antonio Sclocchi
Ecole Polytechnique Federale de Lausanne

This page is a summary of: On the different regimes of stochastic gradient descent, Proceedings of the National Academy of Sciences, February 2024, Proceedings of the National Academy of Sciences,
DOI: 10.1073/pnas.2316301121.
You can read the full text:

Read

Resources

Press Release
Charting new paths in AI learning
Press release of the paper

Contributors

The following have contributed to this page

Antonio Sclocchi
Ecole Polytechnique Federale de Lausanne

Explaining how the stochastic gradient descent algorithm affects the AI learning process

What is it about?

Why is it important?

Perspectives

Resources

Charting new paths in AI learning

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Explaining how the stochastic gradient descent algorithm affects the AI learning process

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Resources

Charting new paths in AI learning

Contributors

Share this page:

You might also like

Intersecting kinematic encoding and readout of intention in autism

Exploring county-level spatio-temporal patterns in opioid overdose related emergency department visits

Irrigation suitability, health risk assessment and source apportionment of heavy metals in surface water used for irrigation near marble industry in Malakand, Pakistan

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management