New
v2.1.0
- Cleaning up repo structure: Before we used
rl-tools/rl-toolsas a monorepo to version everything in the RLtools universe together. This is not great if someone just wants the header-only library (aka just the./include), onegit clone --recursiveorgit submodule update --init --recursivecould trigger gigabytes of downloads. Also, jumping around the history is cumbersome with all the submodules (e.g. for bisecting etc.). Hence, we moved the versioning of adjacent projects torl-tools/monoandrl-tools/rl-toolsis now the submodule-free, lightweight core (~7mb download for full history). - Memory: The main work between
v2.0andv2.1has been on maturing the memory implementation (aka using RNNs in off-policy algorithms). For more information see the RNN and Memory chapters in the documentation - Flag Environment: We introduce a basic environment to test the recurrent RL algorithms, where the position of two flags is revealed in the initial step and the policy has to memorize them to visit the positions in order. The second position is required because with only one position the agent could cheat by just accelerating into the right direction and hence storing the direction in the state instead of memorizing it internally. You can see the Flag environment and a baseline policy at https://zoo.rl.tools
- Adding inference utils: We have added some common inference utils in
include/rl_tools/inferencethat e.g. can expose a pure C interface (e.g. for microcontroller integrations) and more. - Full CUDA training: We have revived the full on-GPU training it is tracked here. It supports full CUDA graph capture which means 1 loop step = 1 graph execution
- L2F: The L2F simulator has been modularized and structured better