AI

Grok-3 Outperforms DeepSeek-R1, Matches OpenAI o1 Pro

27
xAI - Grok

xAI, the AI research company founded by Elon Musk, has unveiled its latest model, Grok-3, and early benchmarks suggest it outperforms several competitors.

The model has reportedly surpassed the 1400 mark on Chatbot Arena, making it one of the most capable AI models available today.

Grok-3’s Strengths in Reasoning and Research

One of Grok-3’s standout features is its advanced reasoning (Think) capabilities and a deep research function called DeepSearch. AI researcher Andrej Karpathy, founder of Eureka Labs and a former OpenAI and Tesla researcher, was given early access to the model.

In a post on X (formerly Twitter), Karpathy shared his experience, stating that Grok-3 successfully handled complex tasks, such as creating a hex grid for Settlers of Catan, a challenge that only OpenAI’s top-tier models like o1 Pro ($200/month) have mastered. In contrast, he noted that models like DeepSeek-R1, Gemini 2.0 Flash Thinking, and Claude failed in this area.

Karpathy also tested Grok-3’s ability to analyze the technical specifications of AI models, uploading OpenAI’s GPT-2 paper to estimate the required flops for training. He noted that Grok-3, with its reasoning feature, solved the task flawlessly, while OpenAI’s o1 Pro and GPT-4o failed.

How Grok-3 Compares to Other Leading AI Models

Karpathy described Grok-3’s performance as being on par with OpenAI’s o1 Pro and superior to DeepSeek-R1. However, he acknowledged that further evaluation is needed to determine its true ranking in the AI race.

He also tested Grok-3’s DeepSearch capabilities, which are designed to enhance research. While he found them comparable to Perplexity AI’s deep research, he noted that Grok-3 still struggles with hallucinating URLs and lacks accurate citations.

In one test, the model listed 12 major AI labs but failed to include xAI itself, highlighting some remaining gaps in its research abilities.

Experts React to Grok-3’s Performance

After two hours of testing, Karpathy concluded that Grok-3 feels close to the state-of-the-art AI models, calling it “slightly better than DeepSeek-R1 and Gemini 2.0 Flash Thinking.”

Other AI experts, including Lex Fridman, also praised the model. In his own post on X, he said, “My mind is blown, very impressive model.”

With xAI aggressively improving Grok-3, the AI landscape is heating up, and competition with OpenAI, Google, and other AI leaders is fiercer than ever.

Written by
Sazid Kabir

I've loved music and writing all my life. That's why I started this blog. In my spare time, I make music and run this blog for fellow music fans.

Related Articles

DeepSeek Mobile
AITech

Powerful AI on a Budget: DeepSeek’s Distilled R1 Model Needs Just One GPU

DeepSeek, a Chinese AI lab, has released a new smaller version of...

Telegram
Apps & UpdatesAI

Telegram and xAI Partner in $300M Deal to Integrate AI Chatbot Grok

Telegram has entered a major partnership with Elon Musk’s AI company, xAI,...

ChatGPT iOS
AI

ChatGPT Answers Not Good Enough? Use This 3-Word Rule

A simple prompt formula called the “3-word rule” is helping users get...

Carl Pei, Nothing CEO
SmartphonesAITech

Nothing CEO Carl Pei Predicts AI-Powered OS Will Replace Apps in 7-10 Years

Carl Pei, the co-founder of Nothing, shared his bold vision for the...