AI

OpenAI Research Reveals AI Struggles With Coding Tasks

45
OpenAI

Despite the rapid advancements in AI, a new study from OpenAI reveals that even the most cutting-edge AI models remain unable to solve the majority of coding tasks.

OpenAI researchers tested the models using SWE-Lancer, a new benchmark built on over 1,400 software engineering tasks from Upwork.

The findings show that while these AI models can handle basic coding issues, they fall short when dealing with more complex tasks.

AI Models Tested:

The study tested three prominent large language models (LLMs): OpenAI’s o1 reasoning model, GPT-4o, and Anthropic’s Claude 3.5 Sonnet.

The models were tasked with resolving individual coding tasks, such as fixing bugs, and management tasks, like making high-level decisions in software projects.

Notably, the models were not allowed to use the internet to fetch external solutions.

Surface-Level Solutions, Major Shortcomings

The results showed that while the AI models could handle simple bug fixes, they failed to address larger coding issues or dig into the root causes of bugs in more complex projects.

These solutions often appeared to be superficial and lacked the depth and reliability required in real-world software engineering.

Despite being able to perform tasks much faster than humans, the AI models struggled with context comprehension and were prone to offering incorrect or incomplete solutions.

This gap in performance highlights a critical challenge for AI in the software engineering field.

Claude 3.5 Sonnet Performs Better, But Still Falls Short

While Claude 3.5 Sonnet outperformed OpenAI’s models, making more money in its tasks, the majority of its responses were still wrong.

According to the researchers, no model at present can be trusted with real-life coding tasks without higher reliability.

AI Still a Long Way From Replacing Human Coders

The research ultimately demonstrates that while AI is making significant strides in the realm of software engineering, it is not yet ready to replace human coders.

CEOs may dream of firing coders in favor of AI, but the study shows that AI models lack the depth, context, and understanding necessary for complex software engineering.

For now, human expertise remains indispensable in ensuring that coding tasks are completed successfully and comprehensively.

Written by
Sazid Kabir

I've loved music and writing all my life. That's why I started this blog. In my spare time, I make music and run this blog for fellow music fans.

Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Articles

Sundhar Pichai Google
AITech

30% of Google’s Code Now Written by AI, Says CEO Sundar Pichai

Google CEO Sundar Pichai says artificial intelligence is helping the company’s engineers...

Apple Intelligence WWDC 2025
Apps & UpdatesAI

Apple Intelligence Now Handles More Tasks—Here’s What’s New

Apple has announced new updates to Apple Intelligence, making its artificial intelligence...

Google Gemini
AITech

Google Surges Ahead in AI with New Gemini Features

Google quietly gave its Gemini AI a big upgrade that could change...

Artificial Intelligence (AI)
AITech

Apple Study Reveals AI Models Don’t Really Think — Just Memorize Patterns

A new study from Apple researchers has raised serious questions about how...