Programming

21127 readers

279 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

MaungaHikoi@lemmy.nz

UlrikHD@programming.dev

480

AI Models from Google, OpenAI, Anthropic Solve 0% of ‘Hard’ Coding Problems (analyticsindiamag.com)

submitted 3 days ago by cm0002@lemmy.world to c/programming@programming.dev

83 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] Endmaker@ani.social 54 points 3 days ago* (last edited 3 days ago) (2 children)

In the ‘Medium’ difficulty category, OpenAI’s o4-mini-high model scored the highest at 53.5%.

This fits my observation of such models. o4-mini-high is able to help me with 80-90% of the problems at work. For the remaining problems, it would come up with a nonsensical solution and no matter how much I prompt it, it would tunnel-vision on that specific approach. It could never second guess itself and realise that its initial solution is completely off the mark, and try an entirely differently approach. That's where I usually step in and do the work myself.

It still saves me time with the trivial stuff though.

I can't say the same for the rest of the LLMs. They are simply no good at coding and just waste my time.

[–] yogsototh@programming.dev 13 points 3 days ago (1 children)

I didn’t see Claude 4 Sonnet in the tests and this is the one I use. And it looks like about the same category as o4 mini from my experience.

It is a nice tool to have in my belt. But these LLM based agents are still very far from being able to do advanced and hard tasks. But to me it is probably more important to communicate and learn about the limitations about these tools to not lose tile instead of gaining it.

In fact, I am not even sure they are good enough to be used to really generate production-ready code. But they are nice for pre-reviewing, building simple scripts that don’t need to be highly reliable, analyse a project, ask specific questions etc… The game changer for me was to use Clojure-MCP. Having a REPL at disposal really enhance the quality of most answers.

[–] Ugurcan@lemmy.world 4 points 2 days ago

For me, it’s the Claude Code where everything finally clicked. For advanced stuff, sure they’re shit when they left alone. But as long as I approach it as a Junior Developer (breaking down the tasks to easy bites, having a clear plan all the time, steering away from pitfalls), I find myself enjoying other stuff while it’s doing the monkey work. Just be sure you provide it with tools, mcp, rag and some patience.

[–] technocrit@lemmy.dbzer0.com 2 points 3 days ago (2 children)

Search engines are able to help me with 100% of work.

[–] rikudou@lemmings.world 9 points 2 days ago

I remember those times, too (well, some 99.9%, there are still the few issues I never found solution to).

But these times are long past, search engines suck nowadays.

[–] nieceandtows@programming.dev 3 points 2 days ago

Not anymore. They've all made deals with each other, and search engines SUCK these days