---
title: "Don't Decay the Learning Rate, Increase the Batch Size"
type: source
tags: [deep-learning, optimisation, hyperparameters, supporting]
authors: [Smith Samuel L.; Kindermans Pieter-Jan; Ying Chris; Le Quoc V.]
year: 2018
venue: "ICLR 2018, arXiv:1711.00489v2"
kind: paper
raw_path: "raw/Predictive process monitoring/learningrate vs batch size.pdf"
created: 2026-04-13
updated: 2026-04-13
key_claims:
  - Decaying the learning rate can be replaced by increasing the batch size during training, with equivalent learning-curve shape.
  - The technique reduces parameter-update count and parallelises better for large-batch training.
  - Enables re-use of existing schedules without hyperparameter re-tuning.
---

# Smith, Kindermans, Ying, Le 2018 — Don't Decay the Learning Rate, Increase the Batch Size

General deep-learning optimisation paper, not PPM-specific. Filed because it sits in the PPM folder — likely reference material for hyperparameter tuning of PPM neural models. Classic Google Brain result on SGD noise scale.

## Why it's in this wiki
Supporting reference for neural PPM training (cf. [[sources/2018-difrancescomarino-genetic-hpo-ppm]] for HPO in a PPM context).

## Connections
**Concepts:** [[concepts/lstm-ppm]] · [[concepts/transformer-ppm]]