Unclaimed project

Are you a maintainer of ScaleLLM? Claim this project to take control of your public changelog and roadmap.

Claim this project

Changelog

ScaleLLM

A high-performance inference system for large language models, designed for production environments.

vectorch-ai/ScaleLLM·

49540C++Apache-2.0

·Website

cudaefficiencygpuinferencellamallama3+8

Last updated 3 months ago

Sep 13, 2025

What's Changed

fix: choose cuda arthitectures based on cuda version by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/463
kernel: add grouped gemm support for moe by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/458
kernel: added oob handling for grouped gemm kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/465
refactor: add _1 into stride for c...

Read full release & details

May 27, 2025

What's Changed

ci: fix whell build script by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/418
kernel: added attention combine kernel to support split kv by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/419
kernel: refactor and added more unittests for attn combine kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/420
moe: added token dispatcher...

Read full release & details

Mar 2, 2025

What's Changed

ci: add option to skip nvbench build by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/390
ci: build devel image with cuda 12.8 for blackwell by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/391
kernel: added query packing support for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/392
refactor: rename attention to mha to diff...

Read full release & details

Jan 26, 2025

What's Changed

misc: remove legacy logic to support quantization for other types. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/350
upgrade pytorch to 2.5.1 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/351
added cuda 12.6 build image by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/353
fix cmake version issue for manylinux image by @guocuimi in...

Read full release & details

Oct 26, 2024

What's Changed

kernel: added flash infer attention impl by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/327
refactor: flatten block tables to 1d tensor by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/328
kernel: added script to generate instantiation for flashinfer kernels by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/329
refactor: move flash att...

Read full release & details

More C++ Projects

tensorflow

An Open Source Machine Learning Framework for Everyone

194.4k

C++

electron

:electron: Build cross-platform desktop apps with JavaScript, HTML, and CSS

120.7k

C++

godot

Godot Engine – Multi-platform 2D and 3D game engine

108.7k

C++

llama.cpp

LLM inference in C/C++

99.8k

C++

View all C++ projects →

Sep 13, 2025

What's Changed

fix: choose cuda arthitectures based on cuda version by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/463
kernel: add grouped gemm support for moe by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/458
kernel: added oob handling for grouped gemm kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/465
refactor: add _1 into stride for c...

Read full release & details

May 27, 2025

What's Changed

ci: fix whell build script by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/418
kernel: added attention combine kernel to support split kv by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/419
kernel: refactor and added more unittests for attn combine kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/420
moe: added token dispatcher...

Read full release & details

Mar 2, 2025

What's Changed

ci: add option to skip nvbench build by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/390
ci: build devel image with cuda 12.8 for blackwell by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/391
kernel: added query packing support for attention by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/392
refactor: rename attention to mha to diff...

Read full release & details

Jan 26, 2025

What's Changed

misc: remove legacy logic to support quantization for other types. by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/350
upgrade pytorch to 2.5.1 by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/351
added cuda 12.6 build image by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/353
fix cmake version issue for manylinux image by @guocuimi in...

Read full release & details

Oct 26, 2024

What's Changed

kernel: added flash infer attention impl by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/327
refactor: flatten block tables to 1d tensor by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/328
kernel: added script to generate instantiation for flashinfer kernels by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/329
refactor: move flash att...

Read full release & details

ScaleLLM

v0.2.6

What's Changed

v0.2.5

What's Changed

v0.2.4

What's Changed

v0.2.3

What's Changed

v0.2.2

What's Changed

More C++ Projects

tensorflow

electron

godot

llama.cpp

v0.2.6

What's Changed

v0.2.5

What's Changed

v0.2.4

What's Changed

v0.2.3

What's Changed

v0.2.2

What's Changed

More C++ Projects

tensorflow

electron

godot

llama.cpp