To prepare RLtools for mixed-precision training we introduce a type policy numeric_types::Policy. Until now it was easy to switch the floating point type in RLtools because everything depended on the T parameter. For modern deep leraning this is not sufficient because we would like to configure different types for different parts of the models / algorithms (e.g. bf16 for parameters, fp32 f...