Xmas-release :santa: LocalAI 3.9.0! 🚀

LocalAI 3.9.0 is focused on stability, resource efficiency, and smarter agent workflows. We've addressed critical issues with model loading, improved system resource management, and introduced a new Agent Jobs panel for scheduling and managing background agentic tasks. Whether you're running models locally or orchestrating complex agent workflows, this release makes it faster, more reliable, and easier to manage.

📌 TL;DR

| Feature | Summary | |--------|--------| | Agent Jobs Panel | Schedule and run background tasks with cron or via API — perfect for automated workflows. | | Smart Memory Reclaimer | Automatically frees up GPU/VRAM by evicting least recently used models when memory is low. | | LRU Model Eviction | Models are automatically unloaded from memory based on usage patterns to prevent crashes. | | MLX & CUDA 13 Support | New model backends and enhanced GPU compatibility for modern hardware. | | UI Polish & Fixes | Cleaned-up navigation, fixed layout overflow, and various improvements. | | Vibevoice | Added support for the vibevoice backend! |

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

LocalAI 3.9.0 introduces a new Agent Jobs panel in the web UI and API, allowing you to create, run, and schedule agentic tasks in the background that can be started programmatically via API or from the Web interface.

Run agent prompts on a schedule using cron syntax, or via API.
Agents are defined via the model settings, supporting MCP.
Trigger jobs via API for integration into CI/CD or external tools.
Optionally send results to a webhook for post-processing.
Templates and prompts can be dynamically populated with variables.

✅ Use cases: Daily reports, CI integration, automated data processing, scheduled model evaluations.

Screenshot 2025-12-24 at 15-26-32 LocalAI - Agent Jobs

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

We’ve introduced a new Memory Reclaimer that monitors system memory usage and automatically frees up GPU/VRAM when needed.

Screenshot 2025-12-24 at 15-25-30 LocalAI API - 8b3e0eb (8b3e0ebf8aab4071ef7721121f04081c32a5c9da)

Tracks memory consumption across all backends.
When usage exceeds a configured threshold, it evicts the least recently used (LRU) models.
Prevents out-of-memory crashes and keeps your system stable during high load.

This is a step toward adaptive resource management, future versions will expand this with more advanced policies and giving more control.

🔁 LRU Model Eviction: Intelligent Model Management

Building on the new reclaimer, LocalAI now supports LRU (Least Recently Used) eviction for loaded models.

Screenshot 2025-12-24 at 15-27-24 LocalAI - Settings

Set a maximum number of models to keep in memory (e.g., limit to 3).
When a new model is loaded and the limit is reached, the oldest unused model is automatically unloaded.
Fully compatible with single_active_backend mode (now defaults to LRU=1 for backward compatibility).

💡 Ideal for servers with limited VRAM or when running multiple models in parallel.

🖥️ UI & UX Polish

Fixed navbar ordering and login icon — clearer navigation and better visual flow.
Prevented tool call overflow in chat view — no more clipped or misaligned content.
Uniformed link paths (e.g., /browse/ instead of browse) for consistency.
Fixed model selection toggle — header updates correctly when switching models.
Consistent button styling — uniform colors, hover effects, and accessibility.

📦 Backward Compatibility & Architecture

Dropped x86_64 Mac support: no longer maintained in GitHub Actions; ARM64 (M1/M2/M3/M4) is now the recommended architecture.
Updated data storage path from /usr/share to /var/lib: follows Linux conventions for mutable data.
Added CUDA 13 support: now available in Docker images and L4T builds.
New VibeVoice TTS backend real-time text-to-speech with voice cloning support. You can install it from the model gallery!
StableDiffusion-GGML now supports LoRA: expand your image-generation capabilities.

🛠️ Fixes & Improvements

Issue: After v3.8.0, /readyz and /healthz endpoints required authentication, breaking Docker health checks and monitoring tools
Issue: Fixed crashes when importing models from Hugging Face URLs with subfolders (e.g., huggingface://user/model/GGUF/model.gguf).

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

✅ Star the repo
💬 Contribute code, docs, or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Breaking Changes 🛠

chore: switch from /usr/share to /var/lib for data storage by @poretsky in https://github.com/mudler/LocalAI/pull/7361
chore: drop drawin-x86_64 support by @mudler in https://github.com/mudler/LocalAI/pull/7616

Bug fixes :bug:

fix: do not require auth for readyz/healthz endpoints by @mudler in https://github.com/mudler/LocalAI/pull/7403
fix(ui): navbar ordering and login icon by @mudler in https://github.com/mudler/LocalAI/pull/7407
fix: configure sbsa packages for arm64 by @mudler in https://github.com/mudler/LocalAI/pull/7413
fix(ui): prevent box overflow in chat view by @mudler in https://github.com/mudler/LocalAI/pull/7430
fix(ui): Update few links in web UI from 'browse' to '/browse/' by @rampa3 in https://github.com/mudler/LocalAI/pull/7445
fix(paths): remove trailing slash from requests by @mudler in https://github.com/mudler/LocalAI/pull/7451
fix(downloader): do not download model files if not necessary by @mudler in https://github.com/mudler/LocalAI/pull/7492
fix(config): make syncKnownUsecasesFromString idempotent by @mudler in https://github.com/mudler/LocalAI/pull/7493
fix: make sure to close on errors by @mudler in https://github.com/mudler/LocalAI/pull/7521
fix(llama.cpp): handle corner cases with tool array content by @mudler in https://github.com/mudler/LocalAI/pull/7528
fix(7355): Update llama-cpp grpc for v3 interface by @sredman in https://github.com/mudler/LocalAI/pull/7566
fix(chat-ui): model selection toggle and new chat by @mudler in https://github.com/mudler/LocalAI/pull/7574
fix: improve ram estimation by @mudler in https://github.com/mudler/LocalAI/pull/7603
fix(ram): do not read from cgroup by @mudler in https://github.com/mudler/LocalAI/pull/7606
fix: correctly propagate error during model load by @mudler in https://github.com/mudler/LocalAI/pull/7610
fix(ci): remove specific version for grpcio packages by @mudler in https://github.com/mudler/LocalAI/pull/7627
fix(uri): consider subfolders when expanding huggingface URLs by @mintyleaf in https://github.com/mudler/LocalAI/pull/7634

Exciting New Features 🎉

feat: agent jobs panel by @mudler in https://github.com/mudler/LocalAI/pull/7390
chore: refactor css, restyle to be slightly minimalistic by @mudler in https://github.com/mudler/LocalAI/pull/7397
feat(hf-api): return files in nested directories by @mudler in https://github.com/mudler/LocalAI/pull/7396
feat(agent-jobs): add multimedia support by @mudler in https://github.com/mudler/LocalAI/pull/7398

New Contributors

@rampa3 made their first contribution in https://github.com/mudler/LocalAI/pull/7445
@blightbow made their first contribution in https://github.com/mudler/LocalAI/pull/7556

Full Changelog: https://github.com/mudler/LocalAI/compare/v3.8.0...v3.9.0

Xmas-release :santa: LocalAI 3.9.0! 🚀

📌 TL;DR

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

Run agent prompts on a schedule using cron syntax, or via API.
Agents are defined via the model settings, supporting MCP.
Trigger jobs via API for integration into CI/CD or external tools.
Optionally send results to a webhook for post-processing.
Templates and prompts can be dynamically populated with variables.

✅ Use cases: Daily reports, CI integration, automated data processing, scheduled model evaluations.

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

We’ve introduced a new Memory Reclaimer that monitors system memory usage and automatically frees up GPU/VRAM when needed.

Tracks memory consumption across all backends.
When usage exceeds a configured threshold, it evicts the least recently used (LRU) models.
Prevents out-of-memory crashes and keeps your system stable during high load.

This is a step toward adaptive resource management, future versions will expand this with more advanced policies and giving more control.

🔁 LRU Model Eviction: Intelligent Model Management

Building on the new reclaimer, LocalAI now supports LRU (Least Recently Used) eviction for loaded models.

Set a maximum number of models to keep in memory (e.g., limit to 3).
When a new model is loaded and the limit is reached, the oldest unused model is automatically unloaded.
Fully compatible with single_active_backend mode (now defaults to LRU=1 for backward compatibility).

💡 Ideal for servers with limited VRAM or when running multiple models in parallel.

🖥️ UI & UX Polish

Fixed navbar ordering and login icon — clearer navigation and better visual flow.
Prevented tool call overflow in chat view — no more clipped or misaligned content.
Uniformed link paths (e.g., /browse/ instead of browse) for consistency.
Fixed model selection toggle — header updates correctly when switching models.
Consistent button styling — uniform colors, hover effects, and accessibility.

📦 Backward Compatibility & Architecture

Dropped x86_64 Mac support: no longer maintained in GitHub Actions; ARM64 (M1/M2/M3/M4) is now the recommended architecture.
Updated data storage path from /usr/share to /var/lib: follows Linux conventions for mutable data.
Added CUDA 13 support: now available in Docker images and L4T builds.
New VibeVoice TTS backend real-time text-to-speech with voice cloning support. You can install it from the model gallery!
StableDiffusion-GGML now supports LoRA: expand your image-generation capabilities.

🛠️ Fixes & Improvements

Issue: After v3.8.0, /readyz and /healthz endpoints required authentication, breaking Docker health checks and monitoring tools
Issue: Fixed crashes when importing models from Hugging Face URLs with subfolders (e.g., huggingface://user/model/GGUF/model.gguf).

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

✅ Star the repo
💬 Contribute code, docs, or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Breaking Changes 🛠

chore: switch from /usr/share to /var/lib for data storage by @poretsky in https://github.com/mudler/LocalAI/pull/7361
chore: drop drawin-x86_64 support by @mudler in https://github.com/mudler/LocalAI/pull/7616

Bug fixes :bug:

fix: do not require auth for readyz/healthz endpoints by @mudler in https://github.com/mudler/LocalAI/pull/7403
fix(ui): navbar ordering and login icon by @mudler in https://github.com/mudler/LocalAI/pull/7407
fix: configure sbsa packages for arm64 by @mudler in https://github.com/mudler/LocalAI/pull/7413
fix(ui): prevent box overflow in chat view by @mudler in https://github.com/mudler/LocalAI/pull/7430
fix(ui): Update few links in web UI from 'browse' to '/browse/' by @rampa3 in https://github.com/mudler/LocalAI/pull/7445
fix(paths): remove trailing slash from requests by @mudler in https://github.com/mudler/LocalAI/pull/7451
fix(downloader): do not download model files if not necessary by @mudler in https://github.com/mudler/LocalAI/pull/7492
fix(config): make syncKnownUsecasesFromString idempotent by @mudler in https://github.com/mudler/LocalAI/pull/7493
fix: make sure to close on errors by @mudler in https://github.com/mudler/LocalAI/pull/7521
fix(llama.cpp): handle corner cases with tool array content by @mudler in https://github.com/mudler/LocalAI/pull/7528
fix(7355): Update llama-cpp grpc for v3 interface by @sredman in https://github.com/mudler/LocalAI/pull/7566
fix(chat-ui): model selection toggle and new chat by @mudler in https://github.com/mudler/LocalAI/pull/7574
fix: improve ram estimation by @mudler in https://github.com/mudler/LocalAI/pull/7603
fix(ram): do not read from cgroup by @mudler in https://github.com/mudler/LocalAI/pull/7606
fix: correctly propagate error during model load by @mudler in https://github.com/mudler/LocalAI/pull/7610
fix(ci): remove specific version for grpcio packages by @mudler in https://github.com/mudler/LocalAI/pull/7627
fix(uri): consider subfolders when expanding huggingface URLs by @mintyleaf in https://github.com/mudler/LocalAI/pull/7634

Exciting New Features 🎉

feat: agent jobs panel by @mudler in https://github.com/mudler/LocalAI/pull/7390
chore: refactor css, restyle to be slightly minimalistic by @mudler in https://github.com/mudler/LocalAI/pull/7397
feat(hf-api): return files in nested directories by @mudler in https://github.com/mudler/LocalAI/pull/7396
feat(agent-jobs): add multimedia support by @mudler in https://github.com/mudler/LocalAI/pull/7398

New Contributors

@rampa3 made their first contribution in https://github.com/mudler/LocalAI/pull/7445
@blightbow made their first contribution in https://github.com/mudler/LocalAI/pull/7556

Full Changelog: https://github.com/mudler/LocalAI/compare/v3.8.0...v3.9.0

Xmas-release :santa: LocalAI 3.9.0! 🚀

📌 TL;DR

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

🔁 LRU Model Eviction: Intelligent Model Management

🖥️ UI & UX Polish

📦 Backward Compatibility & Architecture

🛠️ Fixes & Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

❤️ Thank You

✅ Full Changelog

What's Changed

Breaking Changes 🛠

Bug fixes :bug:

Exciting New Features 🎉

New Contributors

More Go Projects

ollama

kubernetes

frp

Xmas-release :santa: LocalAI 3.9.0! 🚀

📌 TL;DR

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

🔁 LRU Model Eviction: Intelligent Model Management

🖥️ UI & UX Polish

📦 Backward Compatibility & Architecture

🛠️ Fixes & Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

❤️ Thank You

✅ Full Changelog

What's Changed

Breaking Changes 🛠

Bug fixes :bug:

Exciting New Features 🎉

New Contributors

More Go Projects

ollama

kubernetes

frp

LocalRecall

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

gin