v1.8.1
Today, Google releases EmbeddingGemma, a state-of-the-art multilingual embedding model perfect for on-device use cases. Designed for speed and efficiency, the model features a compact size of 308M parameters and a 2K context window, unlocking new possibilities for mobile RAG pipelines, agents, and more. EmbeddingGemma is trained to support over 100 languages and is the highest-ranking text-only multilingual embedding model under 500M on the Massive Text Embedding Benchmark (MTEB) at the time of writing.
- CPU:
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
--model-id google/embeddinggemma-300m --dtype float32
- CPU with ONNX Runtime:
docker run -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cpu-1.8.1 \
--model-id onnx-community/embeddinggemma-300m-ONNX --dtype float32 --pooling mean
- NVIDIA CUDA:
docker run --gpus all --shm-size 1g -p 8080:80 ghcr.io/huggingface/text-embeddings-inference:cuda-1.8.1 \
--model-id google/embeddinggemma-300m --dtype float32
Notable Changes
- Add support for Gemma3 (text-only) architecture
- Intel updates to Synapse 1.21.3 and IPEX 2.8
- Extend ONNX Runtime support in
OrtRuntime- Support
position_idsandpast_key_valuesas inputs - Handle
padding_sideandpad_token_id
- Support
What's Changed
- Adjust HPU warmup: use dummy inputs with shape more close to real scenario by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/689
- Add
extra_argstotrufflehogto exclude unverified results by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/696 - Update GitHub templates & fix mentions to Text Embeddings Inference by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/697
- Disable Flash Attention with
USE_FLASH_ATTENTIONby @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/692 - Add support for
position_idsandpast_key_valuesinOrtBackendby @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/700 - HPU upgrade to Synapse 1.21.3 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/703
- Upgrade to IPEX 2.8 by @kaixuanliu in https://github.com/huggingface/text-embeddings-inference/pull/702
- Parse
modules.jsonto identify defaultDensemodules by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/701 - Add
padding_sideandpad_token_idinOrtBackendby @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/705 - Update
docs/openapi.jsonfor v1.8.0 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/708 - Add Gemma3 architecture (text-only) by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/711
- Update
versionto 1.8.1 by @alvarobartt in https://github.com/huggingface/text-embeddings-inference/pull/712
Full Changelog: https://github.com/huggingface/text-embeddings-inference/compare/v1.8.0...v1.8.1