Qwen3 support for 0.6B, 4B and 8B on CPU, MPS, and FlashQwen3 on CUDA and Intel HPUs
NomicBert MoE support
JinaAI Re-Rankers V1 support
Matryoshka Representation Learning (MRL)
Dense layer module support (after pooling)
[!NOTE]
Some of the aforementioned changes were released within the patch versions on top of v1.7.0, whilst both Matryoshka Representation Learning (MRL) and Dense layer module support have been recently included and were not released yet.
What's Changed
[Docs] Update quick tour by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/574
Update README.md and supported_models.md by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/572
Back with linting. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/577
[Docs] Add cloud run example by @NielsRogge in https://github.com/huggingface/text-embeddings-inference/pull/573
Fixup by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/578
Fixing the tokenization routes token (offsets are in bytes, not in by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/576
Removing requirements file. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/585
Removing candle-extensions to live on crates.io by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/583
Bump sccache to 0.10.0 and sccache-action to 0.0.9 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/586
optimize the performance of FlashBert Path for HPU by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/575
Revert "Removing requirements file. (#585)" by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/588
Get opentelemetry trace id from request headers by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/425
Add argument for configuring Prometheus port by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/589
Adding missing head. prefix in the weight name in ModernBertClassificationHead by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/591
Fixing the CI (grpc path). by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/593
fix xpu env issue that cannot find right libur_loader.so.0 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/595
enable flash mistral model for HPU device by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/594
remove optimum-habana dependency by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/599
Support NomicBert MoE by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/596
Remove duplicate short option '-p' to fix router executable by @cebtenzzre in https://github.com/huggingface/text-embeddings-inference/pull/602
Update text-embeddings-router --help output by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/603
Warmup padded models too. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/592
Add support for JinaAI Re-Rankers V1 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/582
Gte diffs by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/604
Fix the weight name in GTEClassificationHead by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/606
upgrade pytorch and ipex to 2.7 version by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/607
upgrade HPU FW to 1.21; upgrade transformers to 4.51.3 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/608
Patch DistilBERT variants with different weight keys by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/614
add offline modeling for model jinaai/jina-embeddings-v2-base-code to avoid auto_map to other repository by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/612
Add mean pooling strategy for Modernbert classifier by @kwnath in https://github.com/huggingface/text-embeddings-inference/pull/616
Using serde for pool validation. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/620
Preparing the update to 1.7.1 by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/623
Adding suggestions to fixing missing ONNX files. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/624
Add Qwen3Model by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/627
Add HiddenAct::Silu (remove serde alias) by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/631
Add CPU support for Qwen3-Embedding models by @randomm in https://github.com/huggingface/text-embeddings-inference/pull/632
refactor the code and add wrap_in_hpu_graph to corner case by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/625
Support Qwen3 w/ fp32 on GPU by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/634
Preparing the release. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/639
Default to Qwen3 in README.md and docs/ examples by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/641
Fix Qwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/646
Add integration tests for Gaudi by @baptistecolle in https://github.com/huggingface/text-embeddings-inference/pull/598
Fix Qwen3-Embedding batch vs single inference inconsistency by @lance-miles in https://github.com/huggingface/text-embeddings-inference/pull/648
Fix FlashQwen3 by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/650
Make flake work on metal by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/654
Fixing metal backend. by @Narsil in https://github.com/huggingface/text-embeddings-inference/pull/655
Qwen3 hpu support by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/656
change HPU warmup logic: seq length should be with exponential growth by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/659
Update version to 1.7.3 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/666
Add last token pooling support for ORT. by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/664
Fix Qwen3 Embedding Float16 DType by @tpendragon in https://github.com/huggingface/text-embeddings-inference/pull/663
Fix fmt by re-running pre-commit by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/671
Update version to 1.7.4 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/677
Support MRL (Matryoshka Representation Learning) by @kozistr in https://github.com/huggingface/text-embeddings-inference/pull/676
Add Dense layer for 2_Dense/ modules by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/660
Update version to 1.8.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/686
New Contributors
@NielsRogge made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/574
@cebtenzzre made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/602
@kwnath made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/616
@randomm made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/632
@lance-miles made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/648
@tpendragon made their first contribution in https://github.com/huggingface/text-embeddings-inference/pull/664
Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.7.0...v1.8.0