To Be Greedy, or Not to Be - That Is the Question for Population Based Training Variants [TMLR]

TL;DR Bayesian PBTs optimize the greedy objective more effectively than non-Bayesian PBTs, this can be good or bad (depends on the task & hyperparams) | Paper | Code

Population Based Training (PBT) optimizes a hyperparameter schedule by evolving a population of solutions (weights + hyperparams). It is general, parallel, and scalable. It has been extended to leverage Bayesian Optimization (PB2, PB2-Mix, BG-PBT) or be less greedy (FIRE-PBT)

Greedier Bayesian PBTs can hurt final performance

We find that from the theoretical perspective, Bayesian PBTs are guaranteed to asymptotically approach the returns of the greedy schedule (rather than the optimal one, as claimed in prior work).

Mechanistically, the number of hyperparameter update steps can influence the greediness, and the absolute & relative performance of PBT variants (despite constant total compute). The trends are clear for image classification where only the learning rate is optimized…

Accuracy vs update steps on Fashion-MNIST and CIFAR-10

… but not so clear for reinforcement learning (or image classification with larger search spaces)

RL performance vs update steps on Hopper and Humanoid

Our impartial evaluation showed that no PBT variant is substantially better than others across tasks and settings (note that one limitation of our work is not fully exploring all hyperparameters of PBT variants)

Variant ranking across steps, search space, and population size

Check out the paper for details!

We also release our code containing task-agnostic implementations of five PBT variants, hopefully making future research and comparison of PBT variants easier!

P.S. Thank you to my supervisors and coauthors, Tanja Alderliesten and Peter Bosman.