this post was submitted on 12 Oct 2025
1216 points (99.1% liked)

Programmer Humor

27113 readers
1260 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 2 years ago
MODERATORS
 
you are viewing a single comment's thread
view the rest of the comments
[–] ArsonButCute@lemmy.dbzer0.com 8 points 2 weeks ago (5 children)

If you're planning on using LLMs for coding advice, may I recommend selfhosting a model and adding the documentation and repositories as context?

I use a a 1.5b qwen model (mega dumb) but with no context limit I can attach the documentation for the language I'm using, and attach the files from the repo I'm working in (always a local repo in my case) I can usually explain what I'm doing, what I'm trying to accomplish, and what I've tried to the LLM and it will generate snippets that at the very least point me in the right direction but more often than not solve the problem (after minor tweaks because dumb model not so good at coding)

[–] mstrk@lemmy.world 2 points 2 weeks ago (3 children)

I do use the 1.5b of whatever latest ollama with open web ui as frontend for my personal use. Although I can upload files and search the web it's too slow on my machine.

[–] ArsonButCute@lemmy.dbzer0.com 3 points 2 weeks ago (2 children)

If you've got a decent Nvidia GPU and are hoping on linux, look into the Kobold-cpp Vulkan backend, in my experience it works far better than the CUDA backend and is astronomically faster than the CPU-Only backend.

[–] mstrk@lemmy.world 3 points 2 weeks ago (1 children)

Will look into that when I have some money to invest. Thank you 💪

[–] ArsonButCute@lemmy.dbzer0.com 3 points 2 weeks ago

When/If you do, a RTX3070-lhr (about $300 new) is just about the BARE MINIMUM for gpu inferencing. Its what I use, it gets the job done, but I often find context limits too small to be usable with larger models.

If you wanna go team red, Vulkan should still work for inferencing and you have access to options with significantly more VRAM, allowing you to more effectively use larger models. I'm not sure about speed though, I haven't personally used AMDs GPUs since around 2015.

load more comments (1 replies)