Unclaimed project
Are you a maintainer of mscclpp? Claim this project to take control of your public changelog and roadmap.
Claim this projectChangelog
mscclpp
MSCCL++: A GPU-driven communication stack for scalable AI applications
Back to changelogNew
MSCCL++ v0.6.0
Highlight
- Improved NCCL API integration in MSCCL++ for better performance and usability
- Enhanced execution plan-based executor in MSCCL++
- Fixed several bugs to improve stability and reliability
What's Changed
- Add support for different vector sizes in multimem instructions by @roshandathathri in https://github.com/microsoft/mscclpp/pull/332
- NCCL API Executor Integration by @caiomcbr in https://github.com/microsoft/mscclpp/pull/331
- Fix missing import in executor test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/334
- bfloat16 support by @chhwang in https://github.com/microsoft/mscclpp/pull/336
- Dynamically load libibverbs by @caiomcbr in https://github.com/microsoft/mscclpp/pull/337
- Auto-tune vector sizes for NVLS allreduce6 by @roshandathathri in https://github.com/microsoft/mscclpp/pull/338
- Make ibverbs optional at compile time by @chhwang in https://github.com/microsoft/mscclpp/pull/340
- ProxyChannel Support in Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/342
- Support executors to send packets over ProxyChannel by @caiomcbr in https://github.com/microsoft/mscclpp/pull/344
Fix for ROCm 6.0 by @chhwang in https://github.com/microsoft/mscclpp/pull/347Fix bug for construct sempaphore by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/341Add proxy channel related operations by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/351Add CI for rocm by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/346Tune threads per block for mscclpp executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/345Fix NPKit exit event offset by @yzygitzh in https://github.com/microsoft/mscclpp/pull/356Use IB transport flags only when an IB device exists by @chhwang in https://github.com/microsoft/mscclpp/pull/355Update ROCm CI by @chhwang in https://github.com/microsoft/mscclpp/pull/357Fixing RegisterMemory Allocation for ProxyChannels by @caiomcbr in https://github.com/microsoft/mscclpp/pull/353Fix NCCL API bugs by @chhwang in https://github.com/microsoft/mscclpp/pull/363Perf optimization & support clipping by @chhwang in https://github.com/microsoft/mscclpp/pull/364Fix copyright messages by @chhwang in https://github.com/microsoft/mscclpp/pull/367[Doc] mscclpp docs by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/348Executor AllGather In-Place Support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/365Fix algo repo name by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/369Update docker image for cuda12.4 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/370Fix in-place all-gather input buffer in executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/372[docs] fix quickstart link by @jeffra in https://github.com/microsoft/mscclpp/pull/374Add kernel-based verification for executor_test by @yzygitzh in https://github.com/microsoft/mscclpp/pull/378Lazily create the context stream by @chhwang in https://github.com/microsoft/mscclpp/pull/381Fixing Bug Const Offset in Execution Plan by @caiomcbr in https://github.com/microsoft/mscclpp/pull/380Fix light load bug by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/379Small Adjust in Test Data AllGather at Executor Test by @caiomcbr in https://github.com/microsoft/mscclpp/pull/384Fix missing packet parameter for executor by @yzygitzh in https://github.com/microsoft/mscclpp/pull/385NVLS support for msccl++ executor by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/375Fix typo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/389Improve CMake options by @chhwang in https://github.com/microsoft/mscclpp/pull/376Fixing Message Boundary AllReduce Fallback Code by @caiomcbr in https://github.com/microsoft/mscclpp/pull/391Fix mscclpp_benchmark by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/392Add cross threadblock barrier by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/383AllGather Executor Support in NCCL Interface by @caiomcbr in https://github.com/microsoft/mscclpp/pull/393Providing reduce-scatter test support by @caiomcbr in https://github.com/microsoft/mscclpp/pull/390Select algo according to json config by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/396Add connection events for NPKit by @yzygitzh in https://github.com/microsoft/mscclpp/pull/386Revised ProxyChannel interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/400Setup pipeline for mscclpp over nccl by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/401Exception Max Number Operation per Tb by @caiomcbr in https://github.com/microsoft/mscclpp/pull/405Reduce memory usage for scratch buffer by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/403[Cherry-pick] Move pipeline to official org (#406) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/416[Cherry-pick] trigger ci for release branches (#426) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/427[Cherry-pick] Disable CuMemMap check for ROCm (#411) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/424[Cherry-pick] NVLS support for NCCL API (#410) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/425[Cherry-pick] Fix nccl-test failure issue (#421) by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/429New Contributors
- @jeffra made their first contribution in https://github.com/microsoft/mscclpp/pull/374
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.5.2...v0.6.0