top of page

LLM arena leaderboard: Ranking the best LLMs

  • Falk Thomassen
  • 12 ene 2024
  • 2 Min. de lectura

Actualizado: 19 ago

The LLM arena leaderboard is an important LLM evaluation tool.


Using a dynamic ELO scoring system, the leaderboard provides insights into which models lead in multi-task capabilities, reasoning, and real-world applicability.


Let’s dive in.


Best LLM on the LLM arena leaderboard

Comparing the main frontier models on the LLM arena leaderboard.

LLM arena leaderboard of main frontier models



Last updated: July, 2025

Company

Model

Arena Score

Google

Gemini 2.5 Pro

1474

xAI

Grok 4

1438

OpenAI

GPT-o3

1431

Anthropic

Claude 3.7 Sonnet

1343

Meta

Llama 4

1292

Google's Gemini 2.5 Pro takes the lead for the LLM arena leaderboard with an impressive Arena Score of 1474, significantly surpassing xAI's Grok 4 at 1438 and OpenAI's GPT-o3 at 1431.


Following behind, Anthropic's Claude 3.7 Sonnet with 1343 and Meta’s Llama 4 Maverick further back at 1292.


What is the LLM arena leaderboard?

The LLM arena leaderboard is a platform developed by researchers at UC Berkeley under the LMSYS (Large Model Systems Organization) initiative.


It was designed to evaluate large language models (LLMs) through direct, pairwise comparisons in conversational settings, offering a dynamic ranking system that reflects ongoing competition.

View of LMArena interface

The LLM arena operates as follows:

  • Two LLMs respond to the same prompt anonymously

  • Humans choose the better response based on accuracy, coherence, and helpfulness

  • Scores update after each match to reflect performance


This leaderboard assesses LLMs on a variety of conversational and reasoning tasks, providing a comprehensive view of their capabilities.

Today, the LLM arena leaderboard serves as a key resource for tracking progress in AI development, showcasing how leading models stack up against one another.



Other LLM benchmarks

At BRACAI, we keep track of how the main frontier models perform across multiple benchmarks.


If you have any questions about these benchmarks, or how to get started with AI in your business, feel free to reach out.

 
 
 
bottom of page