Move pipeline to official org by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/406
Disable CuMemMap check for ROCm by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/411
NVLS support for NCCL API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/410
Supporting multi-node executors in NCCL API by @caiomcbr in https://github.com/microsoft/mscclpp/pull/412
Fix synchronization in allreduce8 kernel by @dsidler in https://github.com/microsoft/mscclpp/pull/407
Add ncclBcast / ncclBroadcast support by @SreevatsaAnantharamu in https://github.com/microsoft/mscclpp/pull/419
Update README by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/414
Fix nccl-test failure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/421
Tackle build warnings by @chhwang in https://github.com/microsoft/mscclpp/pull/422
trigger ci for release branches by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/426
Fix CI trigger issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/428
Fix typos in the pipeline by @chhwang in https://github.com/microsoft/mscclpp/pull/420
Update version number by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/433
Enhance the nccl error message handling by @seagater in https://github.com/microsoft/mscclpp/pull/434
[NPKIT] Adding the NPKIT support for kernel allreduce7 in mscclpp-nccl by @PedramAlizadeh in https://github.com/microsoft/mscclpp/pull/399
Fix azure pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/437
Add GpuBuffer class by @chhwang in https://github.com/microsoft/mscclpp/pull/423
Fix CMake build messages by @chhwang in https://github.com/microsoft/mscclpp/pull/443
Flushing Proxy Channels at CPU side upon reaching the Inflight Request Limit by @caiomcbr in https://github.com/microsoft/mscclpp/pull/415
Fix Python binding of exceptions by @chhwang in https://github.com/microsoft/mscclpp/pull/444
Auto-update version numbers in CMakeLists.txt by @chhwang in https://github.com/microsoft/mscclpp/pull/450
Resolve cuMemMap error by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/451
Manage runtime environments by @chhwang in https://github.com/microsoft/mscclpp/pull/452
Lazily create streams for CudaIpcConnection by @chhwang in https://github.com/microsoft/mscclpp/pull/449
Fix PR #449 by @chhwang in https://github.com/microsoft/mscclpp/pull/453
Merge mscclpp-lang to mscclpp project by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/442
Renaming channels by @chhwang in https://github.com/microsoft/mscclpp/pull/436
Add multi-nodes example & update doc by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/455
Adjusting BFS to seek circular dependencies in the msccl-tools DAG by @caiomcbr in https://github.com/microsoft/mscclpp/pull/459
remove unnecessary sync by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/461
Support ReduceScatter in the NCCL interface by @caiomcbr in https://github.com/microsoft/mscclpp/pull/460
Updating MSCCLLang Examples by @caiomcbr in https://github.com/microsoft/mscclpp/pull/462
Disable channel cache by @seagater in https://github.com/microsoft/mscclpp/pull/463
Adjusting AllGather Collective in MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/466
Adding Read Put Packet operation at Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/441
NPKit Support to Read Put Packet Operation by @caiomcbr in https://github.com/microsoft/mscclpp/pull/471
Adjust NPKit IB Event by @caiomcbr in https://github.com/microsoft/mscclpp/pull/472
Fix minor typos and errors in documentation by @RyoYang in https://github.com/microsoft/mscclpp/pull/474
Improving Get Operation at MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/475
Fix memory OOM issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/479
Mark mscclpp-test as deprecated in the doc by @chhwang in https://github.com/microsoft/mscclpp/pull/478
Update allgather fallback algo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/476
Add min operation for allreduce by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/481
NCCL API CI Test for ReduceScatter by @caiomcbr in https://github.com/microsoft/mscclpp/pull/465
Fix correctness issue when mscclppDisableChannelCache set to true by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/483
nccl/rccl integration by @seagater in https://github.com/microsoft/mscclpp/pull/469
Fix reduceMin failaure issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/486
Reduce Operation Support to the Executor by @caiomcbr in https://github.com/microsoft/mscclpp/pull/484
Add CI test for fallback allgather, allreduce, broadcastand reducescatter to NCCL operations by @seagater in https://github.com/microsoft/mscclpp/pull/485
Remove the requirement for CU_DEVICE_ATTRIBUTE_HANDLE_TYPE_FABRIC_SUPPORTED for NVLS support by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/489
Add CUDA 12.8 images by @chhwang in https://github.com/microsoft/mscclpp/pull/488
Add a devcontainer configuration by @chhwang in https://github.com/microsoft/mscclpp/pull/490
Fix CMake installation in Dockerfile for arm64 by @chhwang in https://github.com/microsoft/mscclpp/pull/491
Export mscclpp GpuBuffer to dlpack format by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/492
Fix the virtual address mapping issue of cuMemMap in fallback code by @seagater in https://github.com/microsoft/mscclpp/pull/501
Improve signal/wait performance and fix barrier issue by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/499
Fix performance issue introduced in PR: 499 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/505
Add flag to disable nvls by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/500
Optimized allreduce fallback for ~10KB sizes by @chhwang in https://github.com/microsoft/mscclpp/pull/506
Automatic creation of Scratch Buffer at MSCCLLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/510
Use implicit ctors for default device ctors by @chhwang in https://github.com/microsoft/mscclpp/pull/512
apps/nccl: fix a bug in allreduce kernels for graph mode by @nusislam in https://github.com/microsoft/mscclpp/pull/502
Revised MemoryChannel interfaces by @chhwang in https://github.com/microsoft/mscclpp/pull/508
Fix #508 by @chhwang in https://github.com/microsoft/mscclpp/pull/515
Add NVLS based fallback algo by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/507
Enhance Collective Check at MSCCLang by @caiomcbr in https://github.com/microsoft/mscclpp/pull/511
Support ibv_reg_dmabuf_mr for buffer allocated by cuMemMalloc by @seagater in https://github.com/microsoft/mscclpp/pull/513
Fix the issue of echo message for nccl fallback in CI test by @seagater in https://github.com/microsoft/mscclpp/pull/520
Asynchronous setup by @chhwang in https://github.com/microsoft/mscclpp/pull/514
Adding maxSpinCount to port channel flush by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/518
Fix device assert by @chhwang in https://github.com/microsoft/mscclpp/pull/522
Fix #514 by @chhwang in https://github.com/microsoft/mscclpp/pull/521
Add a CMake option MSCCLPP_GPU_ARCHS by @chhwang in https://github.com/microsoft/mscclpp/pull/525
Update citations by @chhwang in https://github.com/microsoft/mscclpp/pull/524
Set Up a CI Pipeline for H100 by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/526
Properly setting up the device in Ethernet Connection by @caiomcbr in https://github.com/microsoft/mscclpp/pull/527
Add device semaphore API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/523
Address NVCC warning #20012-D by @chhwang in https://github.com/microsoft/mscclpp/pull/528
Rename ChannelTrigger fields and check field values in debug builds by @chhwang in https://github.com/microsoft/mscclpp/pull/529
DLPack fixes by @chhwang in https://github.com/microsoft/mscclpp/pull/537
Improved documentation & minor interface revision by @chhwang in https://github.com/microsoft/mscclpp/pull/541
Use a stream pool for gpuCalloc*() by @chhwang in https://github.com/microsoft/mscclpp/pull/509
Multi-stream CUDA IPC by @chhwang in https://github.com/microsoft/mscclpp/pull/326
Fix #509 by @chhwang in https://github.com/microsoft/mscclpp/pull/546
Fix build processes by @chhwang in https://github.com/microsoft/mscclpp/pull/545
Do not use tail replica by default by @chhwang in https://github.com/microsoft/mscclpp/pull/544
DeviceSemaphore fix by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/553
Fix some typos in docs by @Edenzzzz in https://github.com/microsoft/mscclpp/pull/555
New FIFO test by @chhwang in https://github.com/microsoft/mscclpp/pull/558
FIFO improvements by @chhwang in https://github.com/microsoft/mscclpp/pull/557
Fix #557 by @chhwang in https://github.com/microsoft/mscclpp/pull/560
Support connection between local endpoints by @chhwang in https://github.com/microsoft/mscclpp/pull/561
Fix multi-nodes CI pipeline by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/564
Support any GPUs per node for NCCL_API by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/566
Fix pytest failure by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/567
Fix a FIFO correctness bug by @chhwang in https://github.com/microsoft/mscclpp/pull/549
New semaphore constructors by @chhwang in https://github.com/microsoft/mscclpp/pull/559
Revise NVLS interface by @chhwang in https://github.com/microsoft/mscclpp/pull/458
update readme & bump version by @Binyang2014 in https://github.com/microsoft/mscclpp/pull/550
New Contributors
@dsidler made their first contribution in https://github.com/microsoft/mscclpp/pull/407
@seagater made their first contribution in https://github.com/microsoft/mscclpp/pull/434
@PedramAlizadeh made their first contribution in https://github.com/microsoft/mscclpp/pull/399
@RyoYang made their first contribution in https://github.com/microsoft/mscclpp/pull/474
@nusislam made their first contribution in https://github.com/microsoft/mscclpp/pull/502
@Edenzzzz made their first contribution in https://github.com/microsoft/mscclpp/pull/555
Full Changelog: https://github.com/microsoft/mscclpp/compare/v0.6.0...v0.7.0