We first opensource release version💡
🎉🎉 Main Features
- Asynchronous OpenAI compatible interface adapted from vllm
- Custom defined tensor and unified global memory management
- 🔥 Encode and all-reduce overlap, we named "dual streams"
- Host all-reduce based on SIMD instructions
- Optimized fused kernels, qkv, residual & layernorm etc.
- 🔥 Fused batch attention for decoding base...