🚀 Core Highlights
- Optimized CPU-GPU Expert Scheduling: Introducing a flexible GPU expert mask system that enables intelligent placement of MoE experts across CPU and GPU. The new scheduling system supports multiple placement strategies (frequency-based, uniform, front-loading, random) and dynamic expert updates during inference, significantly improving throughput by up to 30% at lower G...