AI

Grok-3 Outperforms DeepSeek-R1, Matches OpenAI o1 Pro

46
xAI - Grok

xAI, the AI research company founded by Elon Musk, has unveiled its latest model, Grok-3, and early benchmarks suggest it outperforms several competitors.

The model has reportedly surpassed the 1400 mark on Chatbot Arena, making it one of the most capable AI models available today.

Grok-3’s Strengths in Reasoning and Research

One of Grok-3’s standout features is its advanced reasoning (Think) capabilities and a deep research function called DeepSearch. AI researcher Andrej Karpathy, founder of Eureka Labs and a former OpenAI and Tesla researcher, was given early access to the model.

In a post on X (formerly Twitter), Karpathy shared his experience, stating that Grok-3 successfully handled complex tasks, such as creating a hex grid for Settlers of Catan, a challenge that only OpenAI’s top-tier models like o1 Pro ($200/month) have mastered. In contrast, he noted that models like DeepSeek-R1, Gemini 2.0 Flash Thinking, and Claude failed in this area.

Karpathy also tested Grok-3’s ability to analyze the technical specifications of AI models, uploading OpenAI’s GPT-2 paper to estimate the required flops for training. He noted that Grok-3, with its reasoning feature, solved the task flawlessly, while OpenAI’s o1 Pro and GPT-4o failed.

How Grok-3 Compares to Other Leading AI Models

Karpathy described Grok-3’s performance as being on par with OpenAI’s o1 Pro and superior to DeepSeek-R1. However, he acknowledged that further evaluation is needed to determine its true ranking in the AI race.

He also tested Grok-3’s DeepSearch capabilities, which are designed to enhance research. While he found them comparable to Perplexity AI’s deep research, he noted that Grok-3 still struggles with hallucinating URLs and lacks accurate citations.

In one test, the model listed 12 major AI labs but failed to include xAI itself, highlighting some remaining gaps in its research abilities.

Experts React to Grok-3’s Performance

After two hours of testing, Karpathy concluded that Grok-3 feels close to the state-of-the-art AI models, calling it “slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking.”

Other AI experts, including Lex Fridman, also praised the model. In his own post on X, he said, “My mind is blown, very impressive model.”

With xAI aggressively improving Grok-3, the AI landscape is heating up, and competition with OpenAI, Google, and other AI leaders is fiercer than ever.

Written by
Sazid Kabir

I've loved music and writing all my life. That's why I started this blog. In my spare time, I make music and run this blog for fellow music fans.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Stay updated with nomusica.com. Add us to your preferred sources to see our latest updates first.

Related Articles

CES 2026 Robot
Tech & ScienceAI

Netflix Brings Pop Culture to CES 2026 While Embracing AI in Marketing

At CES 2026, the growing role of artificial intelligence in marketing took...

ChatGPT Health
AITech & Science

New ChatGPT Feature Offers Personalized Health Insights Without Replacing Doctors

OpenAI has introduced ChatGPT Health, a new feature designed to help users...

DeepSeek
AIWorld News & Politics

Governments Tighten Controls on Chinese AI Firm DeepSeek

Governments and regulators around the world are increasing scrutiny of Chinese artificial...

Google AI Studio
AI

7 Free Google AI Tools That Beat Paid Alternatives

Google offers several free AI tools that outperform many paid alternatives. These...