Google DeepMind’s New Research on Optimizing Test Time Compute in AI
–
The new Open AI model, AO1, thinks before it responds. It's now at the level of a PhD. Google DeepMind's research breaks down this method. Scaling large language models (LLMs) like GPT-4 and Claude 3.5 has become resource-intensive. This means higher costs, more energy use, and greater latency.
Scaling up model parameters needs enormous compute power. This includes huge datasets and months of training. These models also require vast amounts of electricity. Deploying them in real-time or on mobile devices is challenging. So, there is a need for more efficient ways to scale models.
Instead of making models bigger, what if we made them think longer during inference? This could change how we deploy AI with limited resources. Test time compute refers to the computational effort used by a model when generating outputs. It's like a student taking an exam rather than studying for it. Using test time compute, a model can think harder during inference, making it more efficient.
Verifier reward models help in this process. Think of it as having a genius friend check your answers during a test. A verifier model evaluates steps taken by the main language model. It helps the model choose the best answer. This makes the model more accurate without needing to be massive.
Adaptive response updating allows the model to refine its answers on the fly. It adjusts its responses based on what it learns. This improves output without needing extra pre-training. It's like thinking smarter when the problem is tough.
Compute optimal scaling allocates compute resources based on the task's difficulty. It's like pacing yourself during a marathon. This method adjusts compute power dynamically, making it more efficient. Researchers tested these ideas using the math benchmark, a set of challenging math problems. They fine-tuned the Palm 2 models for revision and verification tasks.
The research focused on fine-tuning revision models and process reward models (PRMs). Fine-tuning teaches the model to revise its answers iteratively. PRMs help the model verify each step of its reasoning. They make the search for the correct answer more efficient.
The results show that compute optimal scaling achieves similar or better performance with less computation. A smaller model can outperform a much larger model using this method. This approach is similar to Open AI’s AO1 model, which also focuses on smarter compute usage.
Both Open AI and DeepMind show that optimizing computation can achieve high performance. This allows more efficient models to perform at or above the level of much bigger ones. The future of AI seems to be moving towards more efficient and smarter models. This shift away from the "bigger is better" approach is promising for the future of AI.