Unclaimed project

Are you a maintainer of kreuzberg? Claim this project to take control of your public changelog and roadmap.

Changelog

kreuzberg

A polyglot document intelligence framework with a Rust core. Extract text, metadata, and structured information from PDFs, Office documents, images, and 75+ formats. Available for Rust, Python, Ruby, Java, Go, PHP, Elixir, C#, TypeScript (Node/Bun/Wasm/Deno)- or use via CLI, REST API, or MCP server.

kreuzberg-dev/kreuzberg·

7.1k344RustMIT

·Website

buncsharpdocument-intelligenceelixirffigolang+14

Last updated 2 days ago

Feb 9, 2026

Comparative benchmark results from workflow run 21820985382.

Commit: 60c099c8c44c1190410bc8e1a4253ae2ba2d9021 Date: 2026-02-09

Read full release & details

Feb 8, 2026

Comparative benchmark results from workflow run 21800982176.

Commit: 74fd21c4c44f9b15b0e676223d085bd92cc50908 Date: 2026-02-08

Read full release & details

Fixed

ODT List and Section Extraction

Fixed ODT extractor not handling text:list and text:section elements. Documents containing bulleted/numbered lists or sections returned empty content.

UTF-16 EML Parsing

Fixed EML files encoded in UTF-16 (LE/BE, with or without BOM) returning empty content. Detects UTF-16 encoding via BOM markers and heuristic byte-pattern analysis, transc...

Read full release & details

Comparative benchmark results from workflow run 21794482055.

Commit: e3b01f40c30609ccc734192bb33b81d6c3eb7239 Date: 2026-02-08

Read full release & details

Feb 7, 2026

What's Changed

Fixed

Excel file-path extraction: .xla (legacy add-in) and .xlsb (binary spreadsheet) graceful fallback was only applied to byte-based extraction; file-path-based extraction still propagated parse errors
PDF test flakiness: Fixed flaky PDF tests caused by concurrent pdfium access during parallel test execution via #[serial]
Benchmark fixtures: Replace...

Read full release & details