v3.9.0
Xmas-release :santa: LocalAI 3.9.0! π
LocalAI 3.9.0 is focused on stability, resource efficiency, and smarter agent workflows. We've addressed critical issues with model loading, improved system resource management, and introduced a new Agent Jobs panel for scheduling and managing background agentic tasks. Whether you're running models locally or orchestrating complex agent workflows, this release makes it faster, more reliable, and easier to manage.
π TL;DR
| Feature | Summary | |--------|--------| | Agent Jobs Panel | Schedule and run background tasks with cron or via API β perfect for automated workflows. | | Smart Memory Reclaimer | Automatically frees up GPU/VRAM by evicting least recently used models when memory is low. | | LRU Model Eviction | Models are automatically unloaded from memory based on usage patterns to prevent crashes. | | MLX & CUDA 13 Support | New model backends and enhanced GPU compatibility for modern hardware. | | UI Polish & Fixes | Cleaned-up navigation, fixed layout overflow, and various improvements. | | Vibevoice | Added support for the vibevoice backend! |
π New Features
π€ Agent Jobs Panel: Schedule & Automate Tasks
LocalAI 3.9.0 introduces a new Agent Jobs panel in the web UI and API, allowing you to create, run, and schedule agentic tasks in the background that can be started programmatically via API or from the Web interface.
- Run agent prompts on a schedule using cron syntax, or via API.
- Agents are defined via the model settings, supporting MCP.
- Trigger jobs via API for integration into CI/CD or external tools.
- Optionally send results to a webhook for post-processing.
- Templates and prompts can be dynamically populated with variables.
β Use cases: Daily reports, CI integration, automated data processing, scheduled model evaluations.
π§ Smart Memory Reclaimer: Auto-Optimize GPU Resources
Weβve introduced a new Memory Reclaimer that monitors system memory usage and automatically frees up GPU/VRAM when needed.
- Tracks memory consumption across all backends.
- When usage exceeds a configured threshold, it evicts the least recently used (LRU) models.
- Prevents out-of-memory crashes and keeps your system stable during high load.
This is a step toward adaptive resource management, future versions will expand this with more advanced policies and giving more control.
π LRU Model Eviction: Intelligent Model Management
Building on the new reclaimer, LocalAI now supports LRU (Least Recently Used) eviction for loaded models.
- Set a maximum number of models to keep in memory (e.g., limit to 3).
- When a new model is loaded and the limit is reached, the oldest unused model is automatically unloaded.
- Fully compatible with
single_active_backendmode (now defaults to LRU=1 for backward compatibility).
π‘ Ideal for servers with limited VRAM or when running multiple models in parallel.
π₯οΈ UI & UX Polish
- Fixed navbar ordering and login icon β clearer navigation and better visual flow.
- Prevented tool call overflow in chat view β no more clipped or misaligned content.
- Uniformed link paths (e.g.,
/browse/instead ofbrowse) for consistency. - Fixed model selection toggle β header updates correctly when switching models.
- Consistent button styling β uniform colors, hover effects, and accessibility.
π¦ Backward Compatibility & Architecture
- Dropped x86_64 Mac support: no longer maintained in GitHub Actions; ARM64 (M1/M2/M3/M4) is now the recommended architecture.
- Updated data storage path from
/usr/shareto/var/lib: follows Linux conventions for mutable data. - Added CUDA 13 support: now available in Docker images and L4T builds.
- New VibeVoice TTS backend real-time text-to-speech with voice cloning support. You can install it from the model gallery!
- StableDiffusion-GGML now supports LoRA: expand your image-generation capabilities.
π οΈ Fixes & Improvements
- Issue: After v3.8.0,
/readyzand/healthzendpoints required authentication, breaking Docker health checks and monitoring tools - Issue: Fixed crashes when importing models from Hugging Face URLs with subfolders (e.g.,
huggingface://user/model/GGUF/model.gguf).
π The Complete Local Stack for Privacy-First AI
LocalAI |
The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
LocalAGI |
Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
β€οΈ Thank You
LocalAI is a true FOSS movement β built by contributors, powered by community.
If you believe in privacy-first AI:
- β Star the repo
- π¬ Contribute code, docs, or feedback
- π£ Share with others
Your support keeps this stack alive.
β Full Changelog
π Click to expand full changelog
What's Changed
Breaking Changes π
- chore: switch from /usr/share to /var/lib for data storage by @poretsky in https://github.com/mudler/LocalAI/pull/7361
- chore: drop drawin-x86_64 support by @mudler in https://github.com/mudler/LocalAI/pull/7616
Bug fixes :bug:
- fix: do not require auth for readyz/healthz endpoints by @mudler in https://github.com/mudler/LocalAI/pull/7403
- fix(ui): navbar ordering and login icon by @mudler in https://github.com/mudler/LocalAI/pull/7407
- fix: configure sbsa packages for arm64 by @mudler in https://github.com/mudler/LocalAI/pull/7413
- fix(ui): prevent box overflow in chat view by @mudler in https://github.com/mudler/LocalAI/pull/7430
- fix(ui): Update few links in web UI from 'browse' to '/browse/' by @rampa3 in https://github.com/mudler/LocalAI/pull/7445
- fix(paths): remove trailing slash from requests by @mudler in https://github.com/mudler/LocalAI/pull/7451
- fix(downloader): do not download model files if not necessary by @mudler in https://github.com/mudler/LocalAI/pull/7492
- fix(config): make syncKnownUsecasesFromString idempotent by @mudler in https://github.com/mudler/LocalAI/pull/7493
- fix: make sure to close on errors by @mudler in https://github.com/mudler/LocalAI/pull/7521
- fix(llama.cpp): handle corner cases with tool array content by @mudler in https://github.com/mudler/LocalAI/pull/7528
- fix(7355): Update llama-cpp grpc for v3 interface by @sredman in https://github.com/mudler/LocalAI/pull/7566
- fix(chat-ui): model selection toggle and new chat by @mudler in https://github.com/mudler/LocalAI/pull/7574
- fix: improve ram estimation by @mudler in https://github.com/mudler/LocalAI/pull/7603
- fix(ram): do not read from cgroup by @mudler in https://github.com/mudler/LocalAI/pull/7606
- fix: correctly propagate error during model load by @mudler in https://github.com/mudler/LocalAI/pull/7610
- fix(ci): remove specific version for grpcio packages by @mudler in https://github.com/mudler/LocalAI/pull/7627
- fix(uri): consider subfolders when expanding huggingface URLs by @mintyleaf in https://github.com/mudler/LocalAI/pull/7634
Exciting New Features π
- feat: agent jobs panel by @mudler in https://github.com/mudler/LocalAI/pull/7390
- chore: refactor css, restyle to be slightly minimalistic by @mudler in https://github.com/mudler/LocalAI/pull/7397
- feat(hf-api): return files in nested directories by @mudler in https://github.com/mudler/LocalAI/pull/7396
- feat(agent-jobs): add multimedia support by @mudler in https://github.com/mudler/LocalAI/pull/7398
New Contributors
- @rampa3 made their first contribution in https://github.com/mudler/LocalAI/pull/7445
- @blightbow made their first contribution in https://github.com/mudler/LocalAI/pull/7556
Full Changelog: https://github.com/mudler/LocalAI/compare/v3.8.0...v3.9.0