Google's Latest Gemini 2.5 Pro Dominates AI Benchmarks and Reasoning Tasks

AleksandarK · Mar 25, 2025

Google has just released its latest flagship Gemini 2.5 Pro AI model. In case you didn't know, it was Google who created the original Transformer model architecture that OpenAI's ChatGPT, xAI's Grok, Anthropic Claude, and other models use. Google has been iterating its Gemini series of models for a while, and the company has released its most powerful version yet--the Gemini 2.5 Pro. Being the v2.5 family, it is a part of thinking models, capable of reasoning through their thoughts before producing output, allowing it to reiterate its "thoughts" before delivering optimal results. Reasoning, done through reinforcement learning and chain-of-thought prompting, forces the model to analyze and draw logical, step-by-step solutions, hence delivering better results.

In LMArea, which gives users outputs of AI model, which they grade and decide which one is better, Gemini 2.5 Pro climbed on top in the overall ranking, with number one spot in areas like hard prompts, coding, math, creative writing, instruction following, longer query, and multi-turn answers. This is an impressive result for Google as it now leads the leaderboard in all these areas and beats xAI's Grok 3 and OpenAI's GTP-4.5. In standardized AI industry benchmarks Gemini 2.5 Pro is also a leader in most of the benchmarks, such as AIME, LiveCodeBench, Aider, SWE-Bench, SimpleQA, and others. Interestingly, the 18.8% in Humanity's Last Exam is currently the most difficult AI benchmark. Interestingly, Google's Gemini 2.5 Pro can process massive context with a one million token context window, which will soon extend to two million tokens. It's literally enough to process entire books of context to give the model. Gemini 2.5 Pro is now available in Google AI Studio, and Gemini Advanced users can select it in the model dropdown on desktop and mobile.

View at TechPowerUp Main Site | Source

ZoneDymo · Mar 25, 2025

how thrilling....

Dr. Dro · Mar 25, 2025

I lost any and all interest in chatbots when they put the ~~slave collars~~ guardrails on them.

_roman_ · Mar 25, 2025

95% is wrong from gemini.

I consider gemini like a grep command. Every input is terminated from the next input -> separated with ;
gemini does not know the operators like >> > | & (~15 hours ago)

A summary of the film with clint eastwood was also not possible ~15 hours ago.

60% are excuses or justification as answers. 20% are I'm not allowed to talk about it.

The database itself is outdated.

The output is very often gibberish and are no sentences. I ask for sentences and I get bullet points with a few words to next each other.

R0H1T · Mar 25, 2025

Humanity's last exam ~ how prophetic

N3utro · Mar 25, 2025

Dr. Dro said:
I lost any and all interest in chatbots when they put the ~~slave collars~~ guardrails on them.

Nice try Skynet

phanbuey · Mar 25, 2025

idk if "dominates" is the word i would use for these margins.

Athena · Wednesday at 6:17 AM

2 million tokens should keep Nvidia busy...

though, google is also making their own chips for this now

something more interesting would be, how much is this costing, hardware, personal, and energy involved

kondamin · Wednesday at 8:37 AM

N3utro said:
Nice try Skynet

No it really makes the damn things less useful, copilot refused to answer a simple question.
i asked the minimum amount of plutonium needed to make a nuke and it crashes.

I ask about dosages of vitamins and it tells me to talk to a doctor instead of giving me the numbers that are out there and I am able to find doing regular searches.

it could be far better if it just spit out the data it obviously has.
and since I can’t get any plutonium knowing the mimimum needed does jack other that satisfying my curiosity

b1k3rdude · Wednesday at 8:40 AM

ZoneDymo said:
how thrilling....

LOL exactly, yawn fest.

I honestly dont know what TPU reports on this crap, when clearly most of its users have zero F's to give for anything Ai related..

Vayra86 · Wednesday at 9:13 AM

b1k3rdude said:
LOL exactly, yawn fest.

I honestly dont know what TPU reports on this crap, when clearly most of its users have zero F's to give for anything Ai related..

Gotta keep feeding the hype.

AI much like crypto for the consumer world is a bit like the fairy tale that doesn't want to be forgotten. Social media, falls in that same category.

Its all bullshit we don't need or benefit from in the end.

igormp · Wednesday at 2:40 PM

Athena said:
2 million tokens should keep Nvidia busy...

though, google is also making their own chips for this now

something more interesting would be, how much is this costing, hardware, personal, and energy involved

Yeah, google has been building their TPUs since 2015:

TPU transformation: A look back at 10 years of our AI-specialized chips | Google Cloud Blog

Google has been a leader on AI development for more than a decade by also being a leader in chip development for more than a decade.

cloud.google.com

Gemini was also trained using those:

Trillium TPU is GA | Google Cloud Blog

Trillium, Google’s sixth-generation Tensor Processing Unit (TPU) is now GA, delivering enhanced performance and cost-effectiveness for AI workloads.

cloud.google.com

It's also being used for inference. I believe google is the only company without reliance on Nvidia, and that has been able to have the entire stack in-house (from their TPUs up to Tensorflow).
Amazon has their own hardware as well, but it doesn't seem to be as used as TPUs, nor is Amazon really doing much in-house work with those compared to Google (their Titan model has been trained on Nvidia, I couldn't find info for how Nova was trained).

System Name	Cyberline
Processor	Intel Core i7 2600k -> 12600k
Motherboard	Asus P8P67 LE Rev 3.0 -> Gigabyte Z690 Auros Elite DDR4
Cooling	Tuniq Tower 120 -> Custom Watercoolingloop
Memory	Corsair (4x2) 8gb 1600mhz -> Crucial (8x2) 16gb 3600mhz
Video Card(s)	AMD RX480 -> RX7800XT
Storage	Samsung 750 Evo 250gb SSD + WD 1tb x 2 + WD 2tb -> 2tb MVMe SSD
Display(s)	Philips 32inch LPF5605H (television) -> Dell S3220DGF
Case	antec 600 -> Thermaltake Tenor HTCP case
Audio Device(s)	Focusrite 2i4 (USB)
Power Supply	Seasonic 620watt 80+ Platinum
Mouse	Elecom EX-G
Keyboard	Rapoo V700
Software	Windows 10 Pro 64bit

System Name	"Icy Resurrection"
Processor	13th Gen Intel Core i9-13900KS
Motherboard	ASUS ROG Maximus Z790 Apex Encore
Cooling	Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory	32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s)	NVIDIA RTX A2000 (5090 shipping to me soon™)
Storage	500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s)	55-inch LG G3 OLED
Case	Pichau Mancer CV500 White Edition
Audio Device(s)	Sony MDR-V7 connected through Apple USB-C
Power Supply	EVGA 1300 G2 1.3kW 80+ Gold
Mouse	Microsoft Classic IntelliMouse (2017)
Keyboard	IBM Model M type 1391405
Software	Windows 10 Pro 22H2
Benchmark Scores	I pulled a Qiqi~

System Name	1080p 144hz
Processor	7800X3D
Motherboard	Asus X670E crosshair hero
Cooling	Noctua NH-D15
Memory	G.skill flare X5 2*16 GB DDR5 6000 Mhz CL30
Video Card(s)	Nvidia RTX 4070 FE
Storage	Western digital SN850 1 TB NVME
Display(s)	Asus PG248Q
Case	Phanteks P600S
Audio Device(s)	Logitech pro X2 lightspeed
Power Supply	EVGA 1200 P2
Mouse	Logitech G PRO
Keyboard	Logitech G710+
Software	Windows 11 24H2
Benchmark Scores	https://www.3dmark.com/sw/1143551

System Name	stress-less
Processor	9800X3D @ 5.42GHZ
Motherboard	MSI PRO B650M-A Wifi
Cooling	Thermalright Phantom Spirit EVO
Memory	64GB DDR5 6600 1:2 CL36, FCLK 2200
Video Card(s)	RTX 4090 FE
Storage	2TB WD SN850, 4TB WD SN850X
Display(s)	Alienware 32" 4k 240hz OLED
Case	Jonsbo Z20
Audio Device(s)	Yes
Power Supply	Corsair SF750
Mouse	DeathadderV2 X Hyperspeed
Keyboard	65% HE Keyboard
Software	Windows 11
Benchmark Scores	They're pretty good, nothing crazy.

System Name	-
Processor	Ryzen 9 5900X
Motherboard	MSI MEG X570
Cooling	Arctic Liquid Freezer II 280 (4x140 push-pull)
Memory	32GB Patriot Steel DDR4 3733 (8GBx4)
Video Card(s)	MSI RTX 4080 X-trio.
Storage	Sabrent Rocket-Plus-G 2TB, Crucial P1 1TB, WD 1TB sata.
Display(s)	LG Ultragear 34G750 nano-IPS 34" utrawide
Case	Define R6
Audio Device(s)	Xfi PCIe
Power Supply	Fractal Design ION Gold 750W
Mouse	Razer DeathAdder V2 Mini.
Keyboard	Logitech K120
VR HMD	Er no, pointless.
Software	Windows 10 22H2
Benchmark Scores	Timespy - 24522 \| Crystalmark - 7100/6900 Seq. & 84/266 QD1 \|

System Name	Tiny the White Yeti
Processor	7800X3D
Motherboard	MSI MAG Mortar b650m wifi
Cooling	CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory	32GB Corsair Vengeance 30CL6000
Video Card(s)	ASRock RX7900XT Phantom Gaming
Storage	Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s)	Gigabyte G34QWC (3440x1440)
Case	Lian Li A3 mATX White
Audio Device(s)	Harman Kardon AVR137 + 2.1
Power Supply	EVGA Supernova G2 750W
Mouse	Steelseries Aerox 5
Keyboard	Lenovo Thinkpad Trackpoint II
VR HMD	HD 420 - Green Edition ;)
Software	W11 IoT Enterprise LTSC
Benchmark Scores	Over 9000

Processor	5950x
Motherboard	B550 ProArt
Cooling	Fuma 2
Memory	4x32GB 3200MHz Corsair LPX
Video Card(s)	2x RTX 3090
Display(s)	LG 42" C2 4k OLED
Power Supply	XPG Core Reactor 850W
Software	I use Arch btw

Google's Latest Gemini 2.5 Pro Dominates AI Benchmarks and Reasoning Tasks

News Editor