forked from microsoft/qlib
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add docs for qlib.rl (microsoft#1322)
* Add docs for qlib.rl * Update docs for qlib.rl * Add homepage introduct to RL framework * Update index Link * Fix Icon * typo * Update catelog * Update docs for qlib.rl * Update docs for qlib.rl * Update figure * Update docs for qlib.rl * Update setup.py * FIx setup.py * Update docs and fix some typos * Fix the reference to RL docs * Update framework.svg * Update framework.svg * Update framework.svg * Update docs for qlibrl. * Update docs for qlibrl. * Update docs for Qlibrl. * Update docs for qlibrl. * Update docs for qlibrl. * Update docs for qlibrl. * Add new framework * Update jpg * Update framework.svg * Update framework.svg * Update Qlib framework and description * Update grammar * Update README.md * Update README.md * Update docs/component/rl.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * Update docs/component/rl.rst Co-authored-by: you-n-g <you-n-g@users.noreply.github.com> * Update docs for qlib.rl * Change theme for docs. * Update docs for qlib.rl * Update docs for qlib.rl * Update docs for qlib.rl * Update docs for qlib.rl. * Update docs for qlib.rl * Update docs for qlib.rl * Update docs for qlib.rl Co-authored-by: Young <afe.young@gmail.com> Co-authored-by: you-n-g <you-n-g@users.noreply.github.com>
- Loading branch information
1 parent
3579484
commit e182124
Showing
22 changed files
with
494 additions
and
136 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,45 @@ | ||
The Framework of QlibRL | ||
======================= | ||
|
||
QlibRL contains a full set of components that cover the entire lifecycle of an RL pipeline, including building the simulator of the market, shaping states & actions, training policies (strategies), and backtesting strategies in the simulated environment. | ||
|
||
QlibRL is basically implemented with the support of Tianshou and Gym frameworks. The high-level structure of QlibRL is demonstrated below: | ||
|
||
.. image:: ../../_static/img/QlibRL_framework.png | ||
:width: 600 | ||
:align: center | ||
|
||
Here, we briefly introduce each component in the figure. | ||
|
||
EnvWrapper | ||
------------ | ||
EnvWrapper is the complete capsulation of the simulated environment. It receives actions from outside (policy/strategy/agent), simulates the changes in the market, and then replies rewards and updated states, thus forming an interaction loop. | ||
|
||
In QlibRL, EnvWrapper is a subclass of gym.Env, so it implements all necessary interfaces of gym.Env. Any classes or pipelines that accept gym.Env should also accept EnvWrapper. Developers do not need to implement their own EnvWrapper to build their own environment. Instead, they only need to implement 4 components of the EnvWrapper: | ||
|
||
- `Simulator` | ||
The simulator is the core component responsible for the environment simulation. Developers could implement all the logic that is directly related to the environment simulation in the Simulator in any way they like. In QlibRL, there are already two implementations of Simulator for single asset trading: 1) ``SingleAssetOrderExecution``, which is built based on Qlib's backtest toolkits and hence considers a lot of practical trading details but is slow. 2) ``SimpleSingleAssetOrderExecution``, which is built based on a simplified trading simulator, which ignores a lot of details (e.g. trading limitations, rounding) but is quite fast. | ||
- `State interpreter` | ||
The state interpreter is responsible for "interpret" states in the original format (format provided by the simulator) into states in a format that the policy could understand. For example, transform unstructured raw features into numerical tensors. | ||
- `Action interpreter` | ||
The action interpreter is similar to the state interpreter. But instead of states, it interprets actions generated by the policy, from the format provided by the policy to the format that is acceptable to the simulator. | ||
- `Reward function` | ||
The reward function returns a numerical reward to the policy after each time the policy takes an action. | ||
|
||
EnvWrapper will organically organize these components. Such decomposition allows for better flexibility in development. For example, if the developers want to train multiple types of policies in the same environment, they only need to design one simulator and design different state interpreters/action interpreters/reward functions for different types of policies. | ||
|
||
QlibRL has well-defined base classes for all these 4 components. All the developers need to do is define their own components by inheriting the base classes and then implementing all interfaces required by the base classes. The API for the above base components can be found `here <../../reference/api.html#module-qlib.rl>`_. | ||
|
||
Policy | ||
------------ | ||
QlibRL directly uses Tianshou's policy. Developers could use policies provided by Tianshou off the shelf, or implement their own policies by inheriting Tianshou's policies. | ||
|
||
Training Vessel & Trainer | ||
------------------------- | ||
As stated by their names, training vessels and trainers are helper classes used in training. A training vessel is a ship that contains a simulator/interpreters/reward function/policy, and it controls algorithm-related parts of training. Correspondingly, the trainer is responsible for controlling the runtime parts of training. | ||
|
||
As you may have noticed, a training vessel itself holds all the required components to build an EnvWrapper rather than holding an instance of EnvWrapper directly. This allows the training vessel to create duplicates of EnvWrapper dynamically when necessary (for example, under parallel training). | ||
|
||
With a training vessel, the trainer could finally launch the training pipeline by simple, Scikit-learn-like interfaces (i.e., ``trainer.fit()``). | ||
|
||
The API for Trainer and TrainingVessel and can be found `here <../../reference/api.html#module-qlib.rl.trainer>`_. |
Oops, something went wrong.