News

Claude 4 Cuts Coding Errors by 25%, Gets Faster, Says Vibe Startup Lovable

Published

2 months ago

May 26, 2025

Anthropic’s Claude 4 is showing off some real gains—and not just on paper. AI startup Lovable says it’s coding faster and making fewer mistakes since switching to the new model.

The San Francisco-based company, which builds AI-powered coding tools, reported a 25% drop in syntax errors and a 40% improvement in speed after upgrading to Claude 4. It’s one of the first startups to publicly share real-world numbers following the Claude Opus 4 and Sonnet 4 rollout on May 22.

Claude Opus 4 Shows Endurance and Accuracy in Tests

Claude Opus 4 isn’t just the new shiny toy—it’s apparently got some stamina. Anthropic says the model was able to code for a continuous seven hours in internal testing, tackling long-form tasks that require multiple steps without losing track or making silly errors halfway through.

That’s something.

The company also reported that Opus 4 scored 72.5% on SWE-bench, a well-known benchmark for evaluating software engineering skills in AI models. For context, that’s a decent jump, especially when models often peak around the 60s in such benchmarks.

Free Claude Sonnet 4 or Paid Opus 4? Depends What You’re Coding

Anthropic’s latest rollout includes two separate Claude 4 variants: Sonnet and Opus. The former is free, the latter locked behind a subscription.

Sonnet is decent. You won’t hate it.

Opus? It’s meant to be the heavy lifter—especially for folks knee-deep in software engineering tasks. According to developers who’ve tried both, Opus doesn’t just write code. It thinks through it. That makes a difference when you’re debugging or building something that can’t afford a single character out of place.

Here’s how they stack up:

Model	Access Level	SWE-bench Score	Context Length	Strengths
Claude Sonnet 4	Free	Not disclosed	200K tokens	Lightweight tasks, fast responses
Claude Opus 4	Paid	72.5%	200K tokens	Deep coding, long tasks
Gemini 2.5 Pro	Paid	Varies	1M tokens	Long-context planning, versatility

Lovable Says Claude 4 “Erased Most Errors”

Anton Osika, the founder of Lovable, shared a pretty bold claim this week on X: “Claude 4 just erased most of Lovable’s errors.” That’s a strong endorsement, considering how temperamental LLMs can be when building apps.

Lovable’s product—an AI-powered “vibe coding” builder—helps users spin up websites and apps using prompt-based workflows. It heavily relies on a coding model that doesn’t just write HTML or JavaScript but understands user intent through vague, casual language.

That’s where Claude 4 apparently shines.

Lovable reported:

25% fewer LLM syntax errors (compared to previous Claude version)
40% faster build and edit times
Improvements visible across both new and old projects

Basically, less fixing, more shipping.

Gemini 2.5 Isn’t Out of the Race

Let’s not bury Google just yet. Gemini 2.5 Pro brings a 1 million-token context window, which is huge. It can “remember” a lot more than Claude 4, which caps out at 200,000 tokens.

But bigger memory doesn’t always mean smarter code.

While testing Dart and Kotlin apps, one developer found Claude 4 produced cleaner code than Gemini, at least in contexts where less memory was needed. That echoes a growing sentiment: Claude still has the upper hand for pure coding tasks—even if Gemini plans better.

The nuance here is crucial. It’s not about who wins. It’s about how you use them.

Mixing Models Might Be the Sweet Spot

The idea that one model can do it all? That’s just not how things work yet. And maybe that’s fine.

Most serious builders are already mixing and matching their tools. Use Gemini for planning, Claude for coding. Toss in OpenAI’s o3 when things get weird. Repeat as needed.

People in dev circles are already adapting:

Use Claude 4 for Dart, Kotlin, and other syntax-sensitive languages
Switch to Gemini 2.5 for long documents, prompt trees, and massive memory tasks
Keep Opus for deep work, and Sonnet for fast interactions

It’s kind of like jazz—there’s no one right way, but you know it when it sounds good.

A Battle of Speed, Syntax, and Memory

In a world where seconds matter and bad code breaks everything, shaving off time and errors is gold. Lovable’s data points aren’t just a pat on the back for Claude 4. They’re proof that models are getting better, not just bigger.

And that might be the most important thing.

Coding tools powered by AI are shifting from novelties to core infrastructure. If Claude 4’s real-world results hold up elsewhere, expect more dev shops and startups to give it a serious look—especially those who’ve been burned by earlier models’ “confidence without competence” issues.