KTransformers v0.4.4

🚀 Core Highlights

Add RL-DPO training support to kt-sft, enabling preference-based reinforcement learning fine-tuning on top of KTransformers’ MoE stack.
- Includes critical PEFT adaptations and bug fixes for RL workflows.
- Example configurations and end-to-end usage can be found in:
  https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/DPO_tutorial.md
Improve large-scale MoE stability and efficiency
- Significantly reduce CPU memory usage during large-chunk prefill.
- Fix Kimi-K2 MoE decode bugs related to buffer management.
- Refine NUMA-aware buffer writing and memory handling paths.

Model support updates
- Add GLM-4.6V support via refactored CPU weight conversion utilities.
- Extend and stabilize Qwen3 / Qwen3-MoE support on NPU (Ascend), including attention, LN, MLP, cache, and expert operators.

Add RL-DPO training support to kt-sft, enabling preference-based reinforcement learning fine-tuning on top of KTransformers’ MoE stack.
- Includes critical PEFT adaptations and bug fixes for RL workflows.
- Example configurations and end-to-end usage can be found in:
  https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/DPO_tutorial.md
Improve large-scale MoE stability and efficiency
- Significantly reduce CPU memory usage during large-chunk prefill.
- Fix Kimi-K2 MoE decode bugs related to buffer management.
- Refine NUMA-aware buffer writing and memory handling paths.

Model support updates
- Add GLM-4.6V support via refactored CPU weight conversion utilities.
- Extend and stabilize Qwen3 / Qwen3-MoE support on NPU (Ascend), including attention, LN, MLP, cache, and expert operators.