This cheatsheet lists some things I need with Ray and that are a bit hard to find in the documentation.
How to restart fail runs in Ray Tune
From the docs:
# Rerun ONLY failed trials after an experiment is finished.
tune.run(my_trainable, config=space,
local_dir=<path/to/dir>, resume="ERRORED_ONLY")
How to resume an experiment with no name with Ray Tune
You can resume checkpointed experiments with ray.tune(name=name, resume=True, ...)
.
If you didn’t specify a name for the experiment you want to resume, Ray assigned a name based on the date and time you launched the experiment, such as run_2021-08-18_02-39
.
You can find this name in the logs or by looking for the most recent directory in ~/ray_results/
.