Theoretical limitations of multi-layer Transformer

by fovcon 1/31/2025, 5:48 PMwith 22 comments

by theszon 2/1/2025, 8:40 AM

  > ...our results give: ... (3) a provable advantage of chain-of-thought, exhibiting a task that becomes exponentially easier with chain-of-thought.
It would be good to also prove that there is no task that becomes exponentially harder with chain-of-thought.

by cubefoxon 1/31/2025, 10:04 PM

Loosely related thought: A year ago, there was a lot of talk about the Mamba SSM architecture replacing transformers. Apparently that didn't happen so far.

by hochstenbachon 2/2/2025, 6:28 AM

Quanta magazine has an article that explains in plain words what the researchers were trying to do : https://www.quantamagazine.org/chatbot-software-begins-to-fa...

by byyoung3on 2/1/2025, 9:39 AM

those lemmas are wild

by cs702on 1/31/2025, 8:14 PM

Huh. I just skimmed this and quickly concluded that it's definitely not light reading.

It sure looks and smells like good work, so I've added it to my reading list.

Nowadays I feel like my reading list is growing faster than I can go through it.