Apple just proposed a smarter way to scale AI — not by making models bigger, but by giving them memory

In a new paper titled “Pretraining with Hierarchical Memories: Separating Long-Tail and Common Knowledge” (arxiv.org/abs/2510.02375), Apple researchers show that a small base model + external memories can match or outperform models twice its size — using only ~10% extra fetched parameters.

The key idea:
Most models waste compute and memory by storing all facts inside their fixed weights, even though each query only needs a tiny fraction of that knowledge.

Apple’s system solves this by separating common knowledge (kept in the base model) from rare facts, which are stored in external “memory blocks” fetched on demand.

Here’s how it works 👇

A retriever maps each input to a cluster path, fetching small blocks from multiple levels of a hierarchical memory.
These blocks plug directly into the feed-forward layers — where transformers usually store factual information.
Only the fetched blocks get updated for related texts, so rare facts are reinforced without overwriting general knowledge.
During inference, the base model stays in fast memory, while small blocks stream in from slower storage as needed.

This approach beats low-rank adapters, uses less compute, and allows teams to edit, block, or add memory — effectively controlling what knowledge the model can access.

In short:
Apple may have just outlined the next evolution of LLM design — models that learn like humans do: fast, modular, and memory-driven.

Apple just proposed a smarter way to scale AI — not by making models bigger, but by giving them memory

Related

Comments

Leave a Reply Cancel reply