Elon Musk’s xAI has released Grok-3, and it’s already making waves in the AI world. But does it live up to the hype?
We put it head-to-head against GPT-4o, Claude 3.5 Sonnet, DeepSeek, and Gemini to see how it performs across different tasks.
Creative Writing: Grok-3 Takes the Lead
Grok-3 excels at crafting engaging and immersive stories. It outperformed Claude 3.5 Sonnet in our test, creating a time-travel narrative with well-developed characters and a strong premise. Claude was more descriptive, but Grok-3’s storytelling felt more natural and engaging.
Summarization: A Tie Based on Style
Grok-3 cannot directly read documents, but when tested with a 47-page IMF report, it managed to summarize the key points effectively. Compared to GPT-4o, Grok-3’s summaries were more conversational, while GPT-4o’s were more analytical. Both were accurate, making the choice a matter of preference.
Censorship & Bias: Grok-3 is More Open
Unlike its competitors, Grok-3 engages in sensitive topics without outright refusals. It acknowledges bias in questions but still provides responses, making it more open than ChatGPT or Gemini. In political discussions, it maintains neutrality better than other AI models, avoiding clear ideological slants.
Coding: Grok-3 Shines
Grok-3 stands out in coding tasks. When asked to build a reaction-based game, it opted for an HTML5 version instead of Python, ensuring accessibility and usability. The generated code was clean, functional, and better structured than what Claude, DeepSeek, and GPT-4o produced.
Math & Logic: OpenAI and DeepSeek Are Still Stronger
While Grok-3 performs well in logical reasoning, it struggles with advanced math. It failed a complex problem from the FrontierMath benchmark that DeepSeek and OpenAI solved correctly. However, for everyday users, its math skills are still solid.
Image Generation: Good, But Not the Best
Grok-3’s image generator, Aurora, is competitive but falls short of specialized models like MidJourney or Stable Diffusion. It does, however, surpass OpenAI’s DALL·E 3 in flexibility and censorship leniency.
Deep Search: Fast But Less Detailed
Grok-3’s web search tool delivers quick, accurate research but lacks the depth of Gemini’s reports. It is, however, faster and less biased than both OpenAI and Google’s AI models.
Final Verdict: Is Grok-3 the Best AI?
Grok-3 is a major step forward for xAI, with impressive performance in creative writing, coding, and logic. It is less restricted than its competitors and offers balanced political responses. However, OpenAI and DeepSeek still lead in advanced math, and Gemini provides richer deep search results.
If you prioritize creativity, coding, and free speech, Grok-3 is an excellent choice. But if you need deep research or complex math solutions, OpenAI and DeepSeek might be better options.
Source: decrypt.co