Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Improve user guides and API docs #7716

Merged
merged 10 commits into from
Apr 6, 2020
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
fix
richardliaw committed Apr 4, 2020
commit 66373963f0314bfcd097ee5679a2ad91d4b994cd
4 changes: 4 additions & 0 deletions doc/source/_static/css/custom.css
Original file line number Diff line number Diff line change
@@ -14,3 +14,7 @@
color: #2980B9;
text-transform: uppercase
}

.rst-content .section ol p, .rst-content .section ul p {
margin-bottom: 0px;
}
2 changes: 1 addition & 1 deletion doc/source/index.rst
Original file line number Diff line number Diff line change
@@ -248,7 +248,7 @@ Getting Involved
:caption: Tune

tune.rst
Tune Guides/Tutorials <tune/generated_guides/overview.rst>
Tune Guides and Tutorials <tune/generated_guides/overview.rst>
tune-usage.rst
tune-schedulers.rst
tune-searchalg.rst
5 changes: 4 additions & 1 deletion doc/source/tune-schedulers.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _tune-schedulers:

Tune Trial Schedulers
=====================

@@ -15,6 +17,7 @@ Current Available Trial Schedulers:
:local:
:backlinks: none

.. _tune-scheduler-pbt:

Population Based Training (PBT)
-------------------------------
@@ -31,7 +34,7 @@ Tune includes a distributed implementation of `Population Based Training (PBT) <
hyperparam_mutations={
"lr": [1e-3, 5e-4, 1e-4, 5e-5, 1e-5],
"alpha": lambda: random.uniform(0.0, 1.0),
...
...
})
tune.run( ... , scheduler=pbt_scheduler)

2 changes: 2 additions & 0 deletions doc/source/tune-searchalg.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _tune-search-alg:

Tune Search Algorithms
======================

4 changes: 4 additions & 0 deletions doc/source/tune-usage.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.. _tune-user-guide:

Tune User Guide
===============

@@ -367,6 +369,8 @@ The checkpoint will be saved at a path that looks like ``local_dir/exp_name/tria
config={"env": "CartPole-v0"},
)

.. _tune-fault-tol:

Fault Tolerance
---------------

4 changes: 2 additions & 2 deletions doc/source/tune.rst
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ Tune: Scalable Hyperparameter Tuning

Tune is a Python library for experiment execution and hyperparameter tuning at any scale. Core features:

* Launch a multi-node `distributed hyperparameter sweep <tune-distributed.html>`_ in less than 10 lines of code.
* Launch a multi-node :ref:`distributed hyperparameter sweep <tune-distributed>` in less than 10 lines of code.
* Supports any machine learning framework, including PyTorch, XGBoost, MXNet, and Keras. See `examples here <tune-examples.html>`_.
* Natively `integrates with optimization libraries <tune-searchalg.html>`_ such as `HyperOpt <https://github.com/hyperopt/hyperopt>`_, `Bayesian Optimization <https://github.com/fmfn/BayesianOptimization>`_, and `Facebook Ax <http://ax.dev>`_.
* Choose among `scalable algorithms <tune-schedulers.html>`_ such as `Population Based Training (PBT)`_, `Vizier's Median Stopping Rule`_, `HyperBand/ASHA`_.
@@ -61,7 +61,7 @@ If using TF2 and TensorBoard, Tune will also automatically generate TensorBoard
:scale: 20%
:align: center

Take a look at the `Distributed Experiments <tune-distributed.html>`_ documentation for:
Take a look at the :ref:`Distributed Experiments <tune-distributed>` documentation for:

1. Setting up distributed experiments on your local cluster
2. Using AWS and GCP
12 changes: 7 additions & 5 deletions doc/source/tune/guides/overview.rst
Original file line number Diff line number Diff line change
@@ -8,31 +8,33 @@ Each hyperparameter configuration evaluation is called a *trial*, and multiple t

.. image:: /images/tune-api.svg

More information about Tune's `search algorithms can be found here <tune-searchalg.html>`__. More information about Tune's `trial schedulers can be found here <tune-schedulers.html>`__. You can check out our `examples page <tune-examples.html>`__ for more code examples.


.. customgalleryitem::
:tooltip: Getting started with Tune.
:figure: /images/tune.png
:description: :doc:`plot_tune-tutorial`

.. customgalleryitem::
:tooltip: Using the Tune Trainable API
:tooltip: A guide to the Tune Trainable API.
:figure: /images/tune.png
:description: :doc:`plot_tune-trainable`

.. customgalleryitem::
:tooltip: Getting started with Tune.
:tooltip: A simple guide to Population-based Training
:figure: /images/tune.png
:description: :doc:`plot_tune-advanced-tutorial`

.. customgalleryitem::
:tooltip: Getting started with Tune.
:tooltip: Distributed Tuning
:figure: /images/tune.png
:description: :doc:`plot_tune-distributed`


.. toctree::
:hidden:

plot_tune-tutorial.rst
plot_tune-trainable.rst
plot_tune-advanced-tutorial.rst
plot_tune-distributed.rst

30 changes: 13 additions & 17 deletions doc/source/tune/guides/plot_tune-advanced-tutorial.rst
Original file line number Diff line number Diff line change
@@ -1,28 +1,24 @@
Tune Advanced Tutorials
=======================
Guide to Population Based Training (PBT)
========================================

In this page, we will explore some advanced functionality in Tune with more examples.
Tune includes a distributed implementation of `Population Based Training (PBT) <https://deepmind.com/blog/population-based-training-neural-networks>`__ as
a :ref:`scheduler <tune-scheduler-pbt>`.

.. image:: /images/tune_advanced_paper1.png

On this page:

PBT starts by training many neural networks in parallel with random hyperparameters, using information from the rest of the population to refine the
hyperparameters and allocate resources to promising models. Let's walk through how to use this algorithm.

.. contents::
:local:
:backlinks: none


Trainable with Population Based Training (PBT)
----------------------------------------------

Tune includes a distributed implementation of `Population Based Training (PBT) <https://deepmind.com/blog/population-based-training-neural-networks>`__ as
a scheduler `PopulationBasedTraining <tune-schedulers.html#Population Based Training (PBT)>`__ .

PBT starts by training many neural networks in parallel with random hyperparameters. But instead of the
networks training independently, it uses information from the rest of the population to refine the
hyperparameters and direct computational resources to models which show promise.

.. image:: images/tune_advanced_paper1.png
Trainable API with Population Based Training
--------------------------------------------

This takes its inspiration from genetic algorithms where each member of the population
PBT takes its inspiration from genetic algorithms where each member of the population
can exploit information from the remainder of the population. For example, a worker might
copy the model parameters from a better performing worker. It can also explore new hyperparameters by
changing the current values randomly.
@@ -35,7 +31,7 @@ This means that PBT can quickly exploit good hyperparameters, can dedicate more
promising models and, crucially, can adapt the hyperparameter values throughout training,
leading to automatic learning of the best configurations.

First we define a Trainable that wraps a ConvNet model.
First, we define a Trainable that wraps a ConvNet model.

.. literalinclude:: /../../python/ray/tune/examples/pbt_convnet_example.py
:language: python
29 changes: 19 additions & 10 deletions doc/source/tune/guides/plot_tune-distributed.rst
Original file line number Diff line number Diff line change
@@ -1,21 +1,27 @@
.. _tune-distributed:

Tune Distributed Experiments
============================

Tune is commonly used for large-scale distributed hyperparameter optimization. This page will overview:

1. How to setup and launch a distributed experiment,
2. `commonly used commands <tune-distributed.html#common-commands>`_, including fast file mounting, one-line cluster launching, and result uploading to cloud storage.
2. :ref:`Commonly used commands <tune-distributed-common>`, including fast file mounting, one-line cluster launching, and result uploading to cloud storage.

**Quick Summary**: To run a distributed experiment with Tune, you need to:

1. Make sure your script has ``ray.init(address=...)`` to connect to the existing Ray cluster.
2. If a ray cluster does not exist, start a Ray cluster (instructions for `local machines <tune-distributed.html#local-cluster-setup>`_, `cloud <tune-distributed.html#launching-a-cloud-cluster>`_).
2. If a ray cluster does not exist, start a Ray cluster.
3. Run the script on the head node (or use ``ray submit``).

.. contents::
:local:
:backlinks: none

Running a distributed experiment
--------------------------------

Running a distributed (multi-node) experiment requires Ray to be started already. You can do this on local machines or on the cloud (instructions for `local machines <tune-distributed.html#local-cluster-setup>`_, `cloud <tune-distributed.html#launching-a-cloud-cluster>`_).
Running a distributed (multi-node) experiment requires Ray to be started already. You can do this on local machines or on the cloud.

Across your machines, Tune will automatically detect the number of GPUs and CPUs without you needing to manage ``CUDA_VISIBLE_DEVICES``.

@@ -29,9 +35,9 @@ One common approach to modifying an existing Tune experiment to go distributed i
import argparse

parser = argparse.ArgumentParser()
parser.add_argument("--ray-address")
parser.add_argument("--address")
args = parser.parse_args()
ray.init(address=args.ray_address)
ray.init(address=args.address)

tune.run(...)

@@ -51,6 +57,8 @@ If you used a cluster configuration (starting a cluster with ``ray up`` or ``ray
1. In the examples, the Ray redis address commonly used is ``localhost:6379``.
2. If the Ray cluster is already started, you should not need to run anything on the worker nodes.

.. _tune-distributed-local:

Local Cluster Setup
-------------------

@@ -98,12 +106,14 @@ Then, you can run your Tune Python script on the head node like:
# On the head node, execute using existing ray cluster
$ python tune_script.py --ray-address=<address>

.. tune-distributed-cloud:

Launching a cloud cluster
-------------------------

.. tip::

If you have already have a list of nodes, go to the `Local Cluster Setup`_ section.
If you have already have a list of nodes, go to :ref:`tune-distributed-local`.

Ray currently supports AWS and GCP. Follow the instructions below to launch nodes on AWS (using the Deep Learning AMI). See the `cluster setup documentation <autoscaling.html>`_. Save the below cluster configuration (``tune-default.yaml``):

@@ -230,10 +240,12 @@ To summarize, here are the commands to run:
# wait a while until after all nodes have started
ray kill-random-node tune-default.yaml --hard

You should see Tune eventually continue the trials on a different worker node. See the `Save and Restore <tune-usage.html#save-and-restore>`__ section for more details.
You should see Tune eventually continue the trials on a different worker node. See the :ref:`Fault Tolerance <tune-fault-tol>` section for more details.

You can also specify ``tune.run(upload_dir=...)`` to sync results with a cloud storage like S3, persisting results in case you want to start and stop your cluster automatically.

.. _tune-distributed-common:

Common Commands
---------------

@@ -284,6 +296,3 @@ Sometimes, your program may freeze. Run this to restart the Ray cluster without
.. code-block:: bash

$ ray up CLUSTER.YAML --restart-only


.. Local Cluster Setup: tune-distributed.html#local-cluster-setup
8 changes: 4 additions & 4 deletions doc/source/tune/guides/plot_tune-trainable.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Trainable API Guide
===================
The Trainable API
=================

As mentioned in `Tune User Guide <tune-usage.html#Tune Training API>`_, Training can be done
with either the `Trainable <tune-usage.html#trainable-api>`__ **Class API** or **function-based API**.
As mentioned in :ref:`Tune User Guide <tune-user-guide>`_, Training can be done
with either a **Class API** (``tune.Trainable``) < or **function-based API** (``track.log``).

Comparably, ``Trainable`` is stateful, supports checkpoint/restore functionality, and provides more control to advanced algorithms.

6 changes: 3 additions & 3 deletions doc/source/tune/guides/plot_tune-tutorial.rst
Original file line number Diff line number Diff line change
@@ -71,7 +71,7 @@ Let's integrate an early stopping algorithm to our search - ASHA, a scalable alg
How does it work? On a high level, it terminates trials that are less promising and
allocates more time and resources to more promising trials. See `this blog post <https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/>`__ for more details.

We can afford to **increase the search space by 5x**, by adjusting the parameter ``num_samples``. See the `Trial Scheduler section <tune-schedulers.html>`__ for more details of available schedulers and library integrations.
We can afford to **increase the search space by 5x**, by adjusting the parameter ``num_samples``. See :ref:`tune-schedulers` for more details of available schedulers and library integrations.

.. literalinclude:: /../../python/ray/tune/tests/tutorial.py
:language: python
@@ -99,7 +99,7 @@ You can also use Tensorboard for visualizing results.
Search Algorithms in Tune
~~~~~~~~~~~~~~~~~~~~~~~~~

With Tune you can combine powerful hyperparameter search libraries such as `HyperOpt <https://github.com/hyperopt/hyperopt>`_ and `Ax <https://ax.dev>`_ with state-of-the-art algorithms such as HyperBand without modifying any model training code. Tune allows you to use different search algorithms in combination with different trial schedulers. See the `Search Algorithm section <tune-searchalg.html>`__ for more details of available algorithms and library integrations.
With Tune you can combine powerful hyperparameter search libraries such as `HyperOpt <https://github.com/hyperopt/hyperopt>`_ and `Ax <https://ax.dev>`_ with state-of-the-art algorithms such as HyperBand without modifying any model training code. Tune allows you to use different search algorithms in combination with different trial schedulers. See :ref:`tune-search-alg` for more details of available algorithms and library integrations.

.. literalinclude:: /../../python/ray/tune/tests/tutorial.py
:language: python
@@ -120,4 +120,4 @@ You can evaluate best trained model using the Analysis object to retrieve the be

Next Steps
----------
Take a look at the `Usage Guide <tune-usage.html>`__ for more comprehensive overview of Tune features.
Take a look at the :ref`tune-user-guide` for more comprehensive overview of Tune features.