News
Claude 4 Cuts Coding Errors by 25%, Gets Faster, Says Vibe Startup Lovable
Anthropic’s Claude 4 is showing off some real gains—and not just on paper. AI startup Lovable says it’s coding faster and making fewer mistakes since switching to the new model.
The San Francisco-based company, which builds AI-powered coding tools, reported a 25% drop in syntax errors and a 40% improvement in speed after upgrading to Claude 4. It’s one of the first startups to publicly share real-world numbers following the Claude Opus 4 and Sonnet 4 rollout on May 22.
Claude Opus 4 Shows Endurance and Accuracy in Tests
Claude Opus 4 isn’t just the new shiny toy—it’s apparently got some stamina. Anthropic says the model was able to code for a continuous seven hours in internal testing, tackling long-form tasks that require multiple steps without losing track or making silly errors halfway through.
That’s something.
The company also reported that Opus 4 scored 72.5% on SWE-bench, a well-known benchmark for evaluating software engineering skills in AI models. For context, that’s a decent jump, especially when models often peak around the 60s in such benchmarks.
Free Claude Sonnet 4 or Paid Opus 4? Depends What You’re Coding
Anthropic’s latest rollout includes two separate Claude 4 variants: Sonnet and Opus. The former is free, the latter locked behind a subscription.
Sonnet is decent. You won’t hate it.
Opus? It’s meant to be the heavy lifter—especially for folks knee-deep in software engineering tasks. According to developers who’ve tried both, Opus doesn’t just write code. It thinks through it. That makes a difference when you’re debugging or building something that can’t afford a single character out of place.
Here’s how they stack up:
Model | Access Level | SWE-bench Score | Context Length | Strengths |
---|---|---|---|---|
Claude Sonnet 4 | Free | Not disclosed | 200K tokens | Lightweight tasks, fast responses |
Claude Opus 4 | Paid | 72.5% | 200K tokens | Deep coding, long tasks |
Gemini 2.5 Pro | Paid | Varies | 1M tokens | Long-context planning, versatility |
Lovable Says Claude 4 “Erased Most Errors”
Anton Osika, the founder of Lovable, shared a pretty bold claim this week on X: “Claude 4 just erased most of Lovable’s errors.” That’s a strong endorsement, considering how temperamental LLMs can be when building apps.
Lovable’s product—an AI-powered “vibe coding” builder—helps users spin up websites and apps using prompt-based workflows. It heavily relies on a coding model that doesn’t just write HTML or JavaScript but understands user intent through vague, casual language.
That’s where Claude 4 apparently shines.
Lovable reported:
-
25% fewer LLM syntax errors (compared to previous Claude version)
-
40% faster build and edit times
-
Improvements visible across both new and old projects
Basically, less fixing, more shipping.
Gemini 2.5 Isn’t Out of the Race
Let’s not bury Google just yet. Gemini 2.5 Pro brings a 1 million-token context window, which is huge. It can “remember” a lot more than Claude 4, which caps out at 200,000 tokens.
But bigger memory doesn’t always mean smarter code.
While testing Dart and Kotlin apps, one developer found Claude 4 produced cleaner code than Gemini, at least in contexts where less memory was needed. That echoes a growing sentiment: Claude still has the upper hand for pure coding tasks—even if Gemini plans better.
The nuance here is crucial. It’s not about who wins. It’s about how you use them.
Mixing Models Might Be the Sweet Spot
The idea that one model can do it all? That’s just not how things work yet. And maybe that’s fine.
Most serious builders are already mixing and matching their tools. Use Gemini for planning, Claude for coding. Toss in OpenAI’s o3 when things get weird. Repeat as needed.
People in dev circles are already adapting:
-
Use Claude 4 for Dart, Kotlin, and other syntax-sensitive languages
-
Switch to Gemini 2.5 for long documents, prompt trees, and massive memory tasks
-
Keep Opus for deep work, and Sonnet for fast interactions
It’s kind of like jazz—there’s no one right way, but you know it when it sounds good.
A Battle of Speed, Syntax, and Memory
In a world where seconds matter and bad code breaks everything, shaving off time and errors is gold. Lovable’s data points aren’t just a pat on the back for Claude 4. They’re proof that models are getting better, not just bigger.
And that might be the most important thing.
Coding tools powered by AI are shifting from novelties to core infrastructure. If Claude 4’s real-world results hold up elsewhere, expect more dev shops and startups to give it a serious look—especially those who’ve been burned by earlier models’ “confidence without competence” issues.
-
News3 months ago
Taiwanese Companies Targeted in Phishing Campaign Using Winos 4.0 Malware
-
News2 months ago
Justin Baldoni Hits Back at Ryan Reynolds, Calling Him a “Co-Conspirator” in Blake Lively Legal Battle
-
News3 months ago
Apple Shuts Down ADP for UK iCloud Users Amid Government Backdoor Demands