LocalLLaMA

2921 readers

15 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

SkySyrup@sh.itjust.works

pax@sh.itjust.works

noneabove1182@sh.itjust.works

Smokeydope@lemmy.world

MonsterBug@sh.itjust.works

k80, 3060 or p40 for performance (lemmy.world)

submitted 2 years ago* (last edited 2 years ago) by plotting_homelab@lemmy.world to c/localllama@sh.itjust.works

11 comments fedilink hide all child comments

so i am looking to get me a gpu in my "beast"(a 24core 128gb tower with to much pci-e) i thought i might buy a used 3090 but then it hit me most applications can work with multiple gpu's so i decided i was going to go with €600 to ebay and using techpowerup i figured out there performance by looking at the memory bandwidth and fp32 performance. So this brought me to the following cards for my own LLaMa, stable-difusion and Blender: 5 Tesla K80's, 3 Tesla P40's or 2 3060's but i cant figure out what would be better for performance and future proofing. the main difference i found is in cuda version but i cant really figure out why that matters. the other thing i found is that 5 k80's are way more power intensive than 3 p40's and that if memory size is really important the p40's are the way to go but then i couldn't figure out real performance numbers as i cant find benchmarks like this one for blender.

So if anyone has a nice source for stable-diffusion and LaMA benchmarks i would appreciate it if you could share it. And if you have one of these cards or multiple and can tel me which option is better i would appreciate it if you shared your opinion

you are viewing a single comment's thread
view the rest of the comments

[–] foolsh_one@sh.itjust.works 2 points 2 years ago (1 children)

I have a p40 I'd be glad to run a benchmark on, just tell me how. I have Ooba and llama.cpp installed on linux Ubuntu 22.04, it's a Dell r620 with 2 x 12 3.5 Ghz cores (2 threads per core for 48 threads) Xeon with 256GB ram @ 1833Mhz, I have a pci-e gen 1 20 slot backplane. The speed of the pci-e bus might impact the loading time of the large models, but seems to not affect the speed of inference.

I went for the p40 for costs per GB of vram, speed was less important to me than being able to load the larger models at all. Including the fan and fan coupling i'm all in about $250 per card. I'm planning on adding more in the future, I to suffer from too many pci-e slots.

The cuda version I dont think will become an issue anytime to soon but is coming to be sure.

[–] plotting_homelab@lemmy.world 1 points 2 years ago* (last edited 2 years ago) (1 children)

p40 I’d be glad to run a benchmark on, just tell me how.

yeah i think that's kind of the issue today there isn't really a benchmark for that kind of stuff. from what i understand the p40 is perfectly capable of running some larger models because of the 24gb. what i don't understand is you are talking fan and fan coupling what do you mean with that is that required i have a supermicro sc747 see link for example would that require more airflow trough the gpus to cool?

[–] foolsh_one@sh.itjust.works 1 points 2 years ago* (last edited 2 years ago) (1 children)

The P40 doesn't have active cooling, it really needs forced air flow which I grabbed one of these for

https://www.ebay.com/itm/285241802202

It's even cheaper now than when I bought mine.

[–] plotting_homelab@lemmy.world 1 points 2 years ago (2 children)

o so a real server chassis provides that airflow but because of the lack of flow in towers/desktops they get overheated i get it, good to know

[–] foolsh_one@sh.itjust.works 1 points 2 years ago* (last edited 2 years ago)

Correct my backplane doesn't have the flow of big server box, also another gotcha is the P40 uses a 8-pin CPU power plug not a 8-pin GPU

Edit 8 pin not 6 pin

[–] foolsh_one@sh.itjust.works 1 points 2 years ago

Also you're asking about multi gpu, I have a few other cards stuffed in my backplane. The GeForce GTX 1050 Ti has 4GB of vram, and is comparable to the P40 in performance. I have split a larger 33B model on the two cards. Splitting a large model is of course slower than running on one card alone, but is much faster than cpu (even with 48 threads). However speed when splitting depends on the speed of the pci-e bus, which for me is limited to gen 1 speeds for now. If you have a faster/newer pci-e standard then you'll see better results than me.