Name: ktransformers
Availability: InStock

Jan 22, 2026

🚀 Core Highlights

Optimized CPU-GPU Expert Scheduling: Introducing a flexible GPU expert mask system that enables intelligent placement of MoE experts across CPU and GPU. The new scheduling system supports multiple placement strategies (frequency-based, uniform, front-loading, random) and dynamic expert updates during inference, significantly improving throughput by up to 30% at lower G...

Dec 24, 2025

🚀 Core Highlights

Native FP8 MoE Kernel: Introducing native FP8 precision support for MoE inference with a new AVX-based kernel. Run FP8 models directly without precision conversion overhead, preserving the original model accuracy while maximizing hardware efficiency.
kt-cli for Effortless Local Inference: A new CLI tool designed for simplicity and ease of use. Model manage...

Dec 22, 2025

Add RL-DPO training support to kt-sft, enabling preference-based reinforcement learning fine-tuning on top of KTransformers’ MoE stack.
- Includes critical PEFT adaptations and bug fixes for RL workflows.
- Example configurations and end-to-end usage can be found in:
  https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/DPO_tutorial.md
**...

Dec 5, 2025

🚀 Core Highlights

Native Kimi-K2-Thinking support with RAWINT4 method, enabling CPU and GPU to share the same INT4 weights without separate conversion.
AMD BLIS backend for INT8 MoE inference, expanding hardware support beyond Intel AMX.
AVX-based Kimi-K2 support for CPUs without AMX instructions.

📌 Models, Hardware & Tooling

Nov 21, 2025

🚀 Core Highlights

Add Qwen3-MoE models and AMX SFT rules to kt-sft, including new attention/operators, enabling Qwen3-MoE fine-tuning via LLaMA-Factory.
Restructure the repo around kt-kernel and kt-sft.
Unified KTMoEWrapper MoE backend with expert deferral for CPU-side MoE inference.
Using Docker for installation and deployment, as the pre-built Docker images, greatly reducing envir...

ktransformers