Hacker News

by mpaepperon 2/14/2025, 8:33 AMwith 0 comments

FlashAttention – optimizing GPU memory for more scalable transformers