Reevaluation of GPT-4’s Bar Exam Performance Reveals Overinflation

One of the big stories in AI comes from a recent review of GPT 4.0's bar exam performance. When GPT 4.0 was first released, it was widely reported that it passed the bar exam in the 90th percentile. This news came from Stanford in 2023. They asked what this meant for AI in the legal field. Such a high score suggested that AI could be very useful in law.

But new research shows this report might not be accurate. A recent paper reevaluated GPT 4.0's bar exam performance. It found that the 90th percentile claim might be overinflated. The paper looked at several methods to verify the score. It found that the original score might not be as high as first thought.

The paper reviewed scores from the Illinois bar exam. These scores came from many test-takers who had failed the exam before. This skewed the results, making GPT 4.0's score seem higher than it really was. The paper found that GPT 4.0's true percentile was closer to 69th, not 90th. On essays, the score was even lower at 49th percentile.

Although a 69th percentile score is still good, it’s not as groundbreaking as first reported. This new data can realign how we see AI's role in legal tasks. Overestimating AI's abilities might lead to poor legal outcomes. Some users of GPT 4.0 reported hallucinations in legal cases. This means the AI made up cases that did not exist.

This news is a reminder that we must verify the claims made about AI. Companies might overstate their models' abilities. Independent evaluations are essential to get the true picture. This helps avoid misuse and ensures the AI's capabilities are well understood.

Understanding where GPT 4.0 really stands can guide us better. We can see that its abilities are not at the 90th percentile, but closer to the 69th. This knowledge helps us chart the true progress of AI. It allows us to set realistic expectations and avoid misunderstanding the AI's capabilities.

