Improved
v0.8.0-rc.0
v0.8.0-rc.0 - 2025-12-08
Changelog
Breaking Changes π₯
- 57feef8b7c1261de0ffc0ef41c3cc8826350ee39 chore: [BREAKING] deprecate phi-2 model (#1667)
- 78a76deeafecfa92eda6b5deff1d0249c970d152 feat: [BREAKING] remove /query api call and adding FastAPI info and Tags (#1621)
Features π
- 475f94e3424f466ed23a9463b3e737855f1578c3 feat: support minor release version format (x.y.z-rc.w) (#1675)
- f1cba23d82f5fbfe7b563623eb06a74f9d44d662 feat: add mistral3 series models (#1668)
- 295068f48866c6741484c1cd8e78729c43d165bb feat: add PV support to RAG service (#1660)
- 5a8b5804dd0ae8a92453249e2b8c9b3d2bd9c99e feat: leverage AIKit for preset image packing (#1649)
- a97f28d09246502958acf1d420160d60d14a7730 feat: add support for generic BYO nodes using NVIDIA GPU feature discovery (#1536)
- 0a6ee7761c9de27333835cc505f6a96b3236ab7d feat: use skopeo mcr images (#1630)
- 9b1f6bd69e6674afe2075c9e034c18d4a9b37cf9 feat: RAG benchmarking based on documents (#1615)
- 38f6846046892b9689da36415967aa19eeeb1090 feat: add version info to ua and cmd (#1633)
- 7933a2cb8519050da042b7344a9540138ae02cb3 feat: add user-agent header for RAG oai client (#1622)
- f73f9b748a8f0c23d47071eb3978833dc5de0b4e feat: add webhook validation for BYO nodes using GPU feature discovery (#1587)
- a68bb82525655a851d75661729ddab14ebb93f30 feat: Provide token usage in rag service (#1605)
- c96327283000368950645bfefca0983d8cba37b8 feat: update gpu-provisioner version to v0.3.7 for kaito (#1604)
- 251d9a2b4e87c686bee6017042fe3545b738872c Revert "feat: support arm64 container images" (#1603)
- 198408ad8cf681214fdf1eabd6e3079bd3ab061b feat: support arm64 container images (#1585)
- 22afc334e21f5162e25f5fd5c254acbda9bd944a feat: add a new InferenceSet CRD and Controller for scaling inference workloads automatically (#1522)
- ff7dd8d39be1c27b0f1acac67286dd5044b33d52 feat: add NVIDIA GPU feature discovery Helm chart (#1586)
- 45f85fd112ff93f6f8a77a76f53ef9dd61efc99e feat: add gemma-3 4B and 27B models (#1572)
- 42331a4c70bb30d8b876a3356437d4b27e1e394a feat: adding total_items to RAG list documents response (#1578)
Bug Fixes π
- fe21280d0aba9f6bfe56b219c8bdd91902bbb25c fix: correct csi-local-node ds label (#1672)
- cef240d4ded8828f258b635d103785964904c005 fix: move GatewayAPIInferenceExtension into InferenceSet Controller (#1656)
- 670d30bbfa2161b9be9c9390fd52160802e90667 fix: add enableInferenceSetController in helm chart config (#1651)
- 6437e6c4fef3e642a8ccdf957f77e1403b040669 fix: add new label to inference pods generated by InfefenceSet (#1645)
- 45d68fd4e86607a9e56233160d29b18837b11d10 fix: add missing inferenceset CRD in charts (#1643)
- cd30f25b14a9372fb593e44bafdfd736ba7921ef fix: add findutils as runtime dependency of skopeo image (#1632)
- 3eeb3725965c4bc2144c816b5f2233cb8f607c18 fix: bump pip to 25.3 in kaito-base image (#1629)
- fe73c4b6dbbe587aa46c6ebad2db1130c0114ac3 fix: add missing steps to skopeo workflow (#1614)
- e8d798baa125f44c9706f3a4f86c3567fdf1a14e fix: use crypto/rand package to generate random string (#1600)
- 173bd5a0c3ec0af29f553d87e096a4218b233cfc fix: switch e2e test usage of Standard_NC6s_v3 to Standard_NV36ads_A10_v5 as NC6 no longer has quota (#1581)
- bfd6a5340b6b650a5a60bcb1ffce8a7aea17a6d2 fix: handle no context found with passthrough to LLM (#1542)
- 712b2ed094cedfd7f8f25fdb4db7ac7398c3f4dd fix: ResourceReady condition is never set to true for BYO (#1547)
Code Refactoring π
- c46be63bec7eac1b5214df8efdf94f378994f577 refactor: move the Inferenceset controller to a top-level pkg (#1670)
Continuous Integration π
- 379117536b32b8604d5e34a8457b98163b0f18b4 ci: reuse pip package cache when building image (#1516)
Documentation π
- dced19613c4ccf50effe4e2241c4b3667f2f8a76 docs: fix enableInferenceSetController install method (#1669)
- 9fb40875a8661adb0c3bcea1aeff1ebf0617ee50 docs: fix gateway-api-inference-extension doc (#1662)
- 19c147649332d1d45484cc97044eaba6dc59f6cf docs: update preset list (#1597)
- 0150971efe62146ffbba16c387f1fae76beaeefb docs: Update the custom model deployment guide (#1592)
- 37d928ce5fe2abb9d525ae23ff02b03b6dc755b9 docs: update docs for using generic BYO nodes (#1588)
- 90c10314c63d74e8a6d5ae357eebc30779494400 docs: update gateway-api-infernece-extension setup (#1528)
- c41057ab96a68000c890cd103fbe178042ded270 docs: Adding proposal for AutoIndexer CRD (#1538)
- 1a0ae1c466949c78e94319631767c362cf4862af docs: Add proposal for using NVIDIA GPU feature discovery to support generic cloud provider nodes (#1548)
- 37a96221dca2ec5ec0d7f2460f7357afe65aae12 docs: fix typo in model-as-oci-artifacts.md (#1557)
- 9bb19c5af4ec6144d69dbadd2f1cb1aaea181bd3 docs: add proposal for gemma 3 models (#1540)
- 6f14e256655f9f32cb76bab85dd6836f0f781c32 docs: Update rag.md (#1534)
- 0d75ff6b1ef8ef74d532002a503be7418aa2b4ab docs: Introduce a new InferenceSet CRD and Controller for scaling inference workloads automatically (#1503)
- f41784cf1d20a633e84da88f83f84f3e5d7c4c9a docs: add versioned documentation for v0.7.x (#1521)
Maintenance π§
- 819c093fe3836dc0fad60428140915ec3697cbcc chore: bump node-forge from 1.3.1 to 1.3.2 in /website (#1654)
- cfb8e7fed282230915881abae15597d543585681 chore: bump vllm to 0.12.0 (#1663)
- 68b58590eb924ad142bd780c765d0ba9e2b0ee72 chore: migrate unit-test to self-hosted runner (#1650)
- 93a668613753753fee5c1ccaee1a08751fe82245 chore: use built-in GenerateName to generate random workspace name by InferenceSet (#1637)
- 04229d03636ec3e789c8e3aee61d5497ec7cc813 chore: bump actions/github-script from 7 to 8 (#1639)
- e5df64e0fdb14e7867885b7732eb2994633f8462 chore: make local-csi-driver a helm dependency (#1483)
- ca164c165c903e6fb88aa6888ec3537580f23285 chore: bump docker/login-action from 3.5.0 to 3.6.0 (#1618)
- ae8b6bfa49aa4c42872d5b0959b31222bf72d1c3 chore: add workflow for building and pushing skopeo image (#1613)
- 1fe9a8fbee6365ed05b86f687ec08e9b3b7a685d chore: bump @docusaurus/core from 3.9.1 to 3.9.2 in /website (#1601)
- 51182cdcaa68cb33ece0260c0e9b06d4df75d1dd chore: bump peter-evans/repository-dispatch from 3 to 4 (#1606)
- b1795b69d007b77360b7049c4c14ebaf61c89548 chore: bump sigs.k8s.io/controller-runtime from 0.21.0 to 0.22.2 and k8s.io/* to 0.34.1 (#1571)
- ec162d83b4a6fbf5bbd1156730f009825be2ae17 chore: disable InferenceSetController by default (#1599)
- f6b5140471b92a0a5bbe8edf758091302545cc6c chore: bump gateway-api-inference-extension to v1.0.1 (#1566)
- 2b8e5656eb3ff55f78a6690735e41fe67e1dfb55 Revert "chore: bump python from 3.12-slim to 3.13-slim in /docker/presets/models/tfs" (#1584)
- f9fe6d389fbac0a0aa94f5df38a6346166962ee7 chore: bump step-security/harden-runner from 2.12.0 to 2.13.1 (#1574)
- c70caf1cf2151b58c6d92307f8fe77c5b5b64b45 chore: rename GB to GiB in GPUConfig (#1565)
- a7b09eeb595c1ac7f2c904dd9fb17ee5f53703ea chore: bump azurerm provider in terraofrm and update example (#1564)
- b27f41c8866ec4ca63ab576098ba0a71f600ff82 chore: bump azure/CLI from 2.1.0 to 2.2.0 (#1552)
- a458d8721448d9ccb23c9fb82ca87f50a0405202 chore: bump react from 19.1.1 to 19.2.0 in /website (#1541)
- 7560f07e3c229cfb7a6aa2bf25d7c2468bc59107 chore: bump @docusaurus/module-type-aliases from 3.9.0 to 3.9.1 in /website (#1535)
- 16e3032c107fcc5ea7bfb58f32c5138cae052e02 chore: bump python from 3.12-slim to 3.13-slim in /docker/presets/models/tfs (#1451)
- ab1f640d583cca878ebf200cc68ad37fddc8a01e chore: bump python from 3.12-slim to 3.13-slim in /docker/ragengine/service (#1456)
- 046b6785186c2914823fc54b1ae386d35c104bca chore: bump actions/cache from 4.2.2 to 4.3.0 (#1530)
- 9e7d73136153882c6cc72a69ebf14e93d1228bb8 chore: bump @docusaurus/core from 3.8.1 to 3.9.1 in /website (#1527)
- 94a1fe4222fd120972a2b8f800794027e27a3b3a chore: bump @docusaurus/types from 3.8.1 to 3.9.1 in /website (#1526)
Testing π
- 71c6747b57788d2240f9b9d76c71210f37c87c9a test: refine inferenceset example and AIKit test (#1674)
- 79383f8aacad188623330cd39b0cd064442034ad test: ignore more paths in e2e test (#1664)
- 2552a54bb59870d17a276ab8e8a9ac009574d23f test: add ListWorkspaces unit test (#1655)
- 23ea347d70be5df0c269d3ebb2299e41f119a343 test: add InferenceSet e2e test (#1642)
- 28f1872643bf17dfc88aea2dd8f34e9b3f718344 test: add keda-kaito-scaler test in AIKit test suite (#1652)
- 83e116403a1b2633a4e6bec33610cfbffa9dc3ac test: fix unstable ut failure (#1646)
- 1da9e3490da860f35e9368a3e4f67a6737e658ed test: add codespell github action (#1506)