What's Changed
- fix: choose cuda arthitectures based on cuda version by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/463
- kernel: add grouped gemm support for moe by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/458
- kernel: added oob handling for grouped gemm kernel by @guocuimi in https://github.com/vectorch-ai/ScaleLLM/pull/465
- refactor: add _1 into stride for c...