[1.7.0] 2026-01-12
In v1.7.0, we focus on addressing the core pain points of model evaluation: difficult test set construction, lack of quantitative metrics, and the gap between automation and manual testing. This release introduces a brand-new Evaluation Module, creating a closed loop from Test Set Generation to Automated Scoring and Human Blind Testing.
Here are the detailed updates:
🎉 Core Highlights
- One-Stop Evaluation Loop: Support automatic test question generation from raw documents, one-click multi-model automated scoring, and visualized comparison reports.
- LMArena Mode Integration: Built-in "Chatbot Arena" style human blind testing to quantify subjective "feelings" into actionable data.
- Multi-Dimensional Question Support: Covers 5 core question types, satisfying evaluation needs ranging from fact-checking to logical reasoning.
🚀 New Features
1. Intelligent Test Set Generation
Stop worrying about the lack of QA pairs. You can now quickly build high-quality evaluation datasets through multiple methods: