v1.8.1

text-embeddings-inference-v1 8 1-embedding-gemma(1)

Today, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases. Designed for speed and efficiency, the model features a compact size of 308M parameters and a 2K context window, unlocking new possibilities for mobile RAG pipelines, agents, and more. EmbeddingGemma is trained to support over 100 languages and is the highest-ranking text-only multilingual embedding model under 500M on the Massive Text Embedding Benchmark (MTEB) at the time of writing.

CPU:

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
    --model-id google/embeddinggemma-300m --dtype float32

CPU with ONNX Runtime:

docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
    --model-id onnx-community/embeddinggemma-300m-ONNX --dtype float32 --pooling mean

NVIDIA CUDA:

docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.1 \
    --model-id google/embeddinggemma-300m --dtype float32

Notable Changes

Add support for Gemma3 (text-only) architecture
Intel updates to Synapse 1.21.3 and IPEX 2.8
Extend ONNX Runtime support in OrtRuntime
- Support position_ids and past_key_values as inputs
- Handle padding_side and pad_token_id

What's Changed

Adjust HPU warmup: use dummy inputs with shape more close to real scenario by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/689
Add extra_args to trufflehog to exclude unverified results by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/696
Update GitHub templates & fix mentions to Text Embeddings Inference by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/697
Disable Flash Attention with USE_FLASH_ATTENTION by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/692
Add support for position_ids and past_key_values in OrtBackend by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/700
HPU upgrade to Synapse 1.21.3 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/703
Upgrade to IPEX 2.8 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/702
Parse modules.json to identify default Dense modules by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/701
Add padding_side and pad_token_id in OrtBackend by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/705
Update docs/openapi.json for v1.8.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/708
Add Gemma3 architecture (text-only) by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/711
Update version to 1.8.1 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/712

Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.8.0...v1.8.1

text-embeddings-inference

Related Projects

mapbox-navigation-android

ToastFish

barcodelib

haze