NVIDIA's CUTLASS 3.x Enhances GEMM Kernel Design with Modular Abstractions
6 hours ago
NVIDIA's CUTLASS 3.x introduces a modular, hierarchical system for GEMM kernel design, improving code readability and extending support to newer architectures like Hopper and Blackwell.