If we crack search and every ChatGPT user, all at once, decides that they want to apply 10 × more compute to every query, then it might make sense for OpenAI to train a 10 × larger model.
There will be a Jevons paradox for intelligence. Today, most users only demand GPT-3.5-level intelligence, and the average person doesn’t know or care that their chatbot isn’t SOTA. But as users become more knowledgeable, they’ll want more intelligence.
(I remember when studying for chess tournaments, I’d always crank engine search to the max. Did I care that, after three seconds, the computer’s reasoning was above any human’s head? No. I wanted the satisfaction of knowing its answer was correct.)
Yet, we’ve only begun thinking about search. Maybe there’s a search paradigm that enables easy answer caching or continuous learning that brings the observed 10 × to 15 × training to search compute ratio closer to 1:1.
My point still stands. Even if scaling is more generally efficient than search, search allows for quicker intelligence in narrow domains. Training larger foundation models is slow. With search, you don’t have to wait.