Log-Linear Attention

by sva_on 6/7/2025, 4:01 PMwith 3 comments

by btillyon 6/7/2025, 9:11 PM

I think it would be very good if they can make this work. I suspect that we do something not entirely unlike this, and that is why spaced repetition is so good for stuffing things into our long term memories.

by iknownothowon 6/7/2025, 8:25 PM

> Log-linear attention replaces the fixed-size hidden state with a logarithmically growing set of hidden states

Does this mean the models can be smaller too (on top of the primary benefit of being faster)?