[1.7.0] 2026-01-12

In v1.7.0, we focus on addressing the core pain points of model evaluation: difficult test set construction, lack of quantitative metrics, and the gap between automation and manual testing. This release introduces a brand-new Evaluation Module, creating a closed loop from Test Set Generation to Automated Scoring and Human Blind Testing.

Here are the detailed updates:

🎉 Core Highlights

One-Stop Evaluation Loop: Support automatic test question generation from raw documents, one-click multi-model automated scoring, and visualized comparison reports.
LMArena Mode Integration: Built-in "Chatbot Arena" style human blind testing to quantify subjective "feelings" into actionable data.
Multi-Dimensional Question Support: Covers 5 core question types, satisfying evaluation needs ranging from fact-checking to logical reasoning.

🚀 New Features

1. Intelligent Test Set Generation

Stop worrying about the lack of QA pairs. You can now quickly build high-quality evaluation datasets through multiple methods:

Here are the detailed updates:

🎉 Core Highlights

One-Stop Evaluation Loop: Support automatic test question generation from raw documents, one-click multi-model automated scoring, and visualized comparison reports.
LMArena Mode Integration: Built-in "Chatbot Arena" style human blind testing to quantify subjective "feelings" into actionable data.
Multi-Dimensional Question Support: Covers 5 core question types, satisfying evaluation needs ranging from fact-checking to logical reasoning.

🚀 New Features

1. Intelligent Test Set Generation

Stop worrying about the lack of QA pairs. You can now quickly build high-quality evaluation datasets through multiple methods:

easy-dataset

🎉 Core Highlights

🚀 New Features

1. Intelligent Test Set Generation

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

haze

[1.7.0] 2026-01-12

🎉 Core Highlights

🚀 New Features

1. Intelligent Test Set Generation

2. Automated Evaluation Tasks (Auto-Evaluation)

3. Human Blind Test Arena

4. Data & Ecosystem

🎉 核心亮点

🚀 新增功能 (New Features)

1. 智能测试集构造 (Test Set Generation)

2. 自动化评估任务 (Auto-Evaluation)

3. 人工盲测竞技场 (Human Blind Test)

4. 数据与生态 (Data & Ecosystem)

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

haze