v3.10.0
π LocalAI 3.10.0 Release! π
LocalAI 3.10.0 is big on agent capabilities, multi-modal support, and cross-platform reliability.
We've added native Anthropic API support, launched a new Video Generation UI, introduced Open Responses API compatibility, and enhanced performance with a unified GPU backend system.
For a full tour, see below!
π TL;DR
| Feature | Summary |
|--------|--------|
| Anthropic API Support | Fully compatible /v1/messages endpoint for seamless drop-in replacement of Claude. |
| Open Responses API | Native support for stateful agents with tool calling, streaming, background mode, and multi-turn conversations, passing all official acceptance tests. |
| Video & Image Generation Suite | New video gen UI + LTX-2 support for text-to-video and image-to-video. |
| Unified GPU Backends | GPU libraries (CUDA, ROCm, Vulkan) packaged inside backend containers β works out of the box on Nvidia, AMD, and ARM64 (Experimental). |
| Tool Streaming & XML Parsing | Full support for streaming tool calls and XML-formatted tool outputs. |
| System-Aware Backend Gallery | Only see backends your system can run (e.g., hide MLX on Linux). |
| Crash Fixes | Prevents crashes on AVX-only CPUs (Intel Sandy/Ivy Bridge) and fixes VRAM reporting on AMD GPUs. |
| Request Tracing | Debug agents & fine-tuning with memory-based request/response logging. |
| Moonshine Backend | Ultra-fast transcription engine for low-end devices. |
| Pocket-TTS | Lightweight, high-fidelity text-to-speech with voice cloning. |
| Vulkan arm64 builds | We now build backends and images for vulkan on arm64 as well |
π New Features & Major Enhancements
π€ Open Responses API: Build Smarter, Autonomous Agents
LocalAI now supports the OpenAI Responses API, enabling powerful agentic workflows locally.
- Stateful conversations via
response_idβ resume and manage long-running agent sessions. - Background mode: Run agents asynchronously and fetch results later.
- Streaming support for tools, images, and audio.
- Built-in tools: Web search, file search, and computer use (via MCP integrations).
- Multi-turn interaction with dynamic context and tool use.
β Ideal for developers building agents that can browse, analyze files, or interact with systems β all on your local machine.
π§ How to Use:
- Set
response_idin your request to maintain session state across calls.- Use
background: trueto run agents asynchronously.- Retrieve results via
GET /api/v1/responses/{response_id}.- Enable streaming with
stream: trueto receive partial responses and tool calls in real time.
π Tip: Use
response_idto build agent orchestration systems that persist context and avoid redundant computation.
Our support passes all the official acceptance tests:
π§ Anthropic Messages API: Clone Claude Locally
LocalAI now fully supports the Anthropic messages API.
- Use
https://api.localai.host/v1/messagesas a drop-in replacement for Claude. - Full tool/function calling support, just like OpenAI.
- Streaming and non-streaming responses.
- Compatible with
anthropic-sdk-go, LangChain, and other tooling.
π₯ Perfect for teams migrating from Anthropic to local inference with full feature parity.
π₯ Video Generation: From Text to Video in the Web UI
- New dedicated video generation page with intuitive controls.
- LTX-2 is supported
- Supports text-to-video and image-to-video workflows.
- Built on top of
diffuserswith full compatibility.
π How to Use:
- Go to
/videoin the web UI.- Enter a prompt (e.g., "A cat walking on a moonlit rooftop").
- Optionally upload an image for image-to-video generation.
- Adjust parameters like
fps,num_frames, andguidance_scale.
βοΈ Unified GPU Backends: Acceleration Works Out of the Box
A major architectural upgrade: GPU libraries (CUDA, ROCm, Vulkan) are now packaged inside backend containers.
- Single image: Now you don't need anymore to pull a specific image for your GPU. Any image works regardless if you have a GPU or not.
- No more manual GPU driver setup β just run the image and get acceleration.
- Works on Nvidia (CUDA), AMD (ROCm), and ARM64 (Vulkan).
- Vulkan arm64 builds enabled
- Reduced image complexity, faster builds, and consistent performance.
π This means latest/master images now support GPU acceleration on all platforms β no extra config!
Note: this is experimental, please help us by filing an issue if something doesn't work!
π§© Tool Streaming & Advanced Parsing
Enhance your agent workflows with richer tool interaction.
- Streaming tool calls: Receive partial tool arguments in real time (e.g.,
input_json_delta). - XML-style tool call parsing: Models that return tools in XML format (
<function>...</function>) are now properly parsed alongside text. - Works across all backends (llama.cpp, vLLM, diffusers, etc.).
π‘ Enables more natural, real-time interaction with agents that use structured tool outputs.
π System-Aware Backend Gallery: Only Compatible Backends Show
The backend gallery now shows only backends your system can run.
- Auto-detects system capabilities (CPU, GPU, MLX, etc.).
- Hides unsupported backends (e.g., MLX on Linux, CUDA on AMD).
- Shows detected capabilities in the hero section.
π€ New TTS Backends: Pocket-TTS
Add expressive voice generation to your apps with Pocket-TTS.
- Real-time text-to-speech with voice cloning support (requires HF login).
- Lightweight, fast, and open-source.
- Available in the model gallery.
π£οΈ Perfect for voice agents, narrators, or interactive assistants. β Note: Voice cloning requires HF authentication and a registered voice model.
π Request Tracing: Debug Your Agents
Trace requests and responses in memory β great for fine-tuning and agent debugging.
- Enable via runtime setting or API.
- Log stored in memory, dropped after max size.
- Fetch logs via
GET /api/v1/trace. - Export to JSON for analysis.
πͺ New 'Reasoning' Field: Extract Thinking Steps
LocalAI now automatically detects and extracts thinking tags from model output.
- Supports both SSE and non-SSE modes.
- Displays reasoning steps in the chat UI (under "Thinking" tab).
- Fixes issue where thinking content appeared as part of final answer.
π Moonshine Backend: Faster Transcription for Low-End Devices
Add Moonshine, an ONNX-based transcription engine, for fast, lightweight speech-to-text.
- Optimized for low-end devices (Raspberry Pi, older laptops).
- One of the fastest transcription engines available.
- Supports live transcription.
π οΈ Fixes & Stability Improvements
π§ Prevent BMI2 Crashes on AVX-Only CPUs
Fixed crashes on older Intel CPUs (Ivy Bridge, Sandy Bridge) that lack BMI2 instructions.
- Now safely falls back to
llama-cpp-fallback(SSE2 only). - No more
EOFerrors during model warmup.
β Ensures LocalAI runs smoothly on older hardware.
π Fix Swapped VRAM Usage on AMD GPUs
Correctly parses rocm-smi output: used and total VRAM are now displayed correctly.
- Fixes misreported memory usage on dual-Radeon setups.
- Handles
HIP_VISIBLE_DEVICESproperly (e.g., when using only discrete GPU).
π The Complete Local Stack for Privacy-First AI
LocalAI |
The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
LocalAGI |
Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
β€οΈ Thank You
LocalAI is a true FOSS movement β built by contributors, powered by community.
If you believe in privacy-first AI:
- β Star the repo
- π¬ Contribute code, docs, or feedback
- π£ Share with others
Your support keeps this stack alive.
β Full Changelog
π Click to expand full changelog
What's Changed
Bug fixes :bug:
- fix(ui): correctly parse import errors by @mudler in https://github.com/mudler/LocalAI/pull/7726
- fix(cli): import via CLI needs system state by @mudler in https://github.com/mudler/LocalAI/pull/7746
- fix(amd-gpu): correctly show total and used vram by @mudler in https://github.com/mudler/LocalAI/pull/7761
- fix: add nil checks before mergo.Merge to prevent panic in gallery model installation by @majiayu000 in https://github.com/mudler/LocalAI/pull/7785
- fix: Usage for image generation is incorrect (and causes error in LiteLLM) by @majiayu000 in https://github.com/mudler/LocalAI/pull/7786
- fix: propagate validation errors by @majiayu000 in https://github.com/mudler/LocalAI/pull/7787
- fix: Failed to download checksums.txt when using launch to install localai by @majiayu000 in https://github.com/mudler/LocalAI/pull/7788
- fix(image-gen): fix scrolling issues by @mudler in https://github.com/mudler/LocalAI/pull/7829
- fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path by @mudler in https://github.com/mudler/LocalAI/pull/7832
- fix: Prevent BMI2 instruction crash on AVX-only CPUs by @coffeerunhobby in https://github.com/mudler/LocalAI/pull/7817
- fix: Highly inconsistent agent response to cogito agent calling MCP server - Body "Invalid http method" by @majiayu000 in https://github.com/mudler/LocalAI/pull/7790
- fix(chat/ui): record model name in history for consistency by @mudler in https://github.com/mudler/LocalAI/pull/7845
- fix(ui): fix 404 on API menu link by pointing to index.html by @DEVMANISHOFFL in https://github.com/mudler/LocalAI/pull/7878
- fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) by @coffeerunhobby in https://github.com/mudler/LocalAI/pull/7864
- fix(model): do not assume success when deleting a model process by @jroeber in https://github.com/mudler/LocalAI/pull/7963
- fix(functions): do not duplicate function when valid JSON is inside XML tags by @mudler in https://github.com/mudler/LocalAI/pull/8043
Exciting New Features π
- feat: disable force eviction by @mudler in https://github.com/mudler/LocalAI/pull/7725
- feat(api): Allow tracing of requests and responses by @richiejp in https://github.com/mudler/LocalAI/pull/7609
- feat(UI): image generation improvements by @mudler in https://github.com/mudler/LocalAI/pull/7804
- feat(image-gen/UI): move controls to the left, make the page more compact by @mudler in https://github.com/mudler/LocalAI/pull/7823
- feat(function): Add tool streaming, XML Tool Call Parsing Support by @mudler in https://github.com/mudler/LocalAI/pull/7865
New Contributors
- @majiayu000 made their first contribution in https://github.com/mudler/LocalAI/pull/7785
- @coffeerunhobby made their first contribution in https://github.com/mudler/LocalAI/pull/7817
- @DEVMANISHOFFL made their first contribution in https://github.com/mudler/LocalAI/pull/7878
- @jroeber made their first contribution in https://github.com/mudler/LocalAI/pull/7963
- @Divyanshupandey007 made their first contribution in https://github.com/mudler/LocalAI/pull/8050
Full Changelog: https://github.com/mudler/LocalAI/compare/v3.9.0...v3.10.0