--- title: "Don't Decay the Learning Rate, Increase the Batch Size" type: source tags: [deep-learning, optimisation, hyperparameters, supporting] authors: [Smith Samuel L.; Kindermans Pieter-Jan; Ying Chris; Le Quoc V.] year: 2018 venue: "ICLR 2018, arXiv:1711.00489v2" kind: paper raw_path: "raw/Predictive process monitoring/learningrate vs batch size.pdf" created: 2026-04-13 updated: 2026-04-13 key_claims: - Decaying the learning rate can be replaced by increasing the batch size during training, with equivalent learning-curve shape. - The technique reduces parameter-update count and parallelises better for large-batch training. - Enables re-use of existing schedules without hyperparameter re-tuning. --- # Smith, Kindermans, Ying, Le 2018 — Don't Decay the Learning Rate, Increase the Batch Size General deep-learning optimisation paper, not PPM-specific. Filed because it sits in the PPM folder — likely reference material for hyperparameter tuning of PPM neural models. Classic Google Brain result on SGD noise scale. ## Why it's in this wiki Supporting reference for neural PPM training (cf. [[sources/2018-difrancescomarino-genetic-hpo-ppm]] for HPO in a PPM context). ## Connections **Concepts:** [[concepts/lstm-ppm]] · [[concepts/transformer-ppm]]