actual link: https://github.com/triton-lang/triton/pull/7298
The Volkswagon emissions testing model
Seems this is likely due to ongoing work on FP8 support on nvidia/cutlass. From my reading, the alternative code path was likely added recently for testing by external contributors to the cutlass project, and other involved parties. (Rather than attempting to distribute custom packaged internal builds of cuda.)
This ticket is a good starting place to see the chain of issues around the ongoing work: https://github.com/NVIDIA/cutlass/pull/2037
So, what is Cutlass, can someone explain whether checking for kernel names makes sense here or is a form of cheating?
I have small experience with compilers and llvm but youd be shocked how many things rely on names and parsing names
If you have hundreds of passes that are complex and rely on various "contracts" like type names or some shit, then really crazy things like this can happen unintentionally and not maliciously
GenuineIntel moment.
is 100 tflops a lot?
Let's hope for Nvidia this is an innocent optimization only valid for internal kernels that cannot be applied in general.
I wish people either learned how to use git or just wholesale stopped using it.
And what’s the downside of using that kernel name? It can’t just be that it’s faster and nothing else. Unless they included lots of sleep(x) calls.
Intel's quest to move from "trusted by default / the reference" to "check for scam" is getting worse every release. And it's 100% self inflicted. How weird.
This tweet appears to be taking the original material out of context to misrepresent it:
> Rewrite the attention kernel to be persistent. This gives better performance at low-contexts. However, fp16 at large context has suffered a bit due to a ptxas instruction scheduling issue in the softmax partition. fp8 is ~100 tflops faster when the kernel name has "cutlass" in it.
The charitable reading is that, on certain kernels, using fp8 rather than fp16 values gives better performance. (Although I can't even see how the numbers relate to a "~100 tflops faster" claim in any respect, nor does it even list any kernel names or suggest a control kernel!) But this is being presented as if someone has uncovered evidence of cheating on benchmarks.
In `libnvidia-nvvm.so` the string `cutlass` appears right after `Memory Dependence Analysis` and `memdep`. Perhaps it acts as an optimization attribute of some sort, where the compiler is allowed to make assumptions about the kernel's behavior that are not valid in general?