Microsoft Unveils Self-Evolving AI Model Rivalling Larger Systems in Math
–
Microsoft's latest research paper reveals an impressive achievement in AI technology. This paper, titled "RAR Math: Small LLMs Can Master Math Reasoning with Self-Evolved Deep Thinking," introduces a new language model. This model can improve its own intelligence through self-evolution, which sounds straight out of a sci-fi story.
The study shows that smaller language models, or SLMs, can now rival larger ones in math reasoning. These models can even beat OpenAI's powerful models without using a method called distillation. Distillation is where a large model, like a teacher, trains a smaller model. But here, the smaller model doesn't need a big teacher to perform well.
RAR Math achieves its results using a technique called Monte Carlo Tree Search. This method explores different possibilities like a decision tree. The AI searches through these options, improving itself without extra training data.
The initial benchmarks of RAR Math are impressive. On math tests, it raised performance from 58.8% to 90% in one model. Another model improved from 41.4% to 86.4%. It even surpassed the 01 Preview model by significant margins on the USA Math Olympiad.
The paper provides a detailed look at how RAR Math works. The system uses a self-evolution framework. It allows the AI to think more deeply and choose the best reasoning paths. It assigns values to each step in solving a problem. Incorrect steps get low values, while correct ones get high values. This process filters out the wrong steps, keeping only the best ones.
One of the key elements in this model's success is the Monte Carlo Tree Search. Here, the AI explores multiple paths to find the best solution. It acts like a person considering the consequences of different decisions. The model uses these steps to create a final, accurate solution.
The self-evolution framework adds a four-step process to improve the model further. It enhances the policy, which is the part of the AI that decides on reasoning steps. It also improves the reward model, which checks if each step is correct.
The research shows a groundbreaking step in AI development. Small language models can now enhance themselves without the need for large models to guide them. This advancement opens up new possibilities for AI in education and other fields. As AI continues to progress, these self-improving models could become standard tools in many applications.