Hacker News

by jbotzon 5/10/2025, 7:49 AMwith 11 comments

by killerstormon 5/14/2025, 7:56 AM

Note that this works within a single sequence of tokens. It might be consistent with "episodic memory" metaphor if we consider a particular transformer run as its experience.

But this might be very different from what people expect from "memory" - i.e. ability to learn vast amounts of information and retrieve it as necessary.

This is more like a refinement of transformer attention: instead of running attention over all tokens (which is very expensive as it's quadratic), it selects a subset of token spans and runs fine-grained attention only on those. So it essentially breaks transformer attention into two parts - coarse-grained (k-NN over token spans) and fine-grained (normal).

It might be a great thing for long-context situations. But it doesn't make sense when you want millions of different facts to be considered - making them into long context is rather inefficient.

by mountainriveron 5/10/2025, 11:18 PM

TTT, cannon layers, and titans seem like a stronger approach IMO.

Information needs to be compressed into latent space or it becomes computationally intractable

by p_v_doomon 5/14/2025, 6:40 AM

Interesting. Before there even was attention I was thinking that the episodic memory model offers something that could be very useful for neural nets, so its cool to see people testing that

by MacsHeadroomon 5/10/2025, 3:40 PM

So, infinite context length by making it compute bound instead of memory bound. Curious how much longer this takes to run and when it makes sense to use vs RAG.

EM-LLM: Human-Inspired Episodic Memory for Infinite Context LLMs