- Implemented batching for GPUs, increasing speed.
- Added support for Tensor Cores on NVIDIA cards.
- Optimized Winograd code for CPU and GPU.
- Improved UCT formula.
- Improved root move selection by using Lower Confidence Bounds.
- Improved passing heuristics (disabled in self-play).
- Improved autotuner behavior in edge cases.
- The training code now supports mixed precision fp32/fp16 t...