overview for robber

How effective are ads? in c/asklemmy@lemmy.ml

[–] robber@lemmy.ml 9 points 6 days ago (2 children)

Given that Google generated more than 250 billion U.S. dollars in ad revenue in 2024, I'd say they must be pretty effective.

Source

Forgejo v13.0 is available in c/selfhosted@lemmy.world

[–] robber@lemmy.ml 1 points 1 week ago (1 children)

Depends on the version you're running.

https://forgejo.org/docs/latest/admin/upgrade/from-gitea/

Relevance of GPU driver version for inference performance in c/localllama@sh.itjust.works

[–] robber@lemmy.ml 1 points 2 weeks ago (1 children)

I see. When I run the inference engine containerized, will the container be able to run its own version of CUDA or use the host's version?

Relevance of GPU driver version for inference performance in c/localllama@sh.itjust.works

[–] robber@lemmy.ml 2 points 2 weeks ago

Thank you for taking the time to respond.

I've used vLLM for hosting a smaller model which could fit in two of GPUs, it was very performant especially for multiple requests at the same time. The major drawback for my setup was that it only supports tensor parallelism for 2, 4, 8, etc. GPUs and data paralellism slowed inference down considerably, at least for my cards. exllamav3 is the only engine I'm aware of which support 3-way TP.

But I'm fully with you in that vLLM seems to be the most recommended and battle-tested solution.

I might take a look at how I can safely upgrade the driver until I can afford a fourth card and switch back to vLLM.

Relevance of GPU driver version for inference performance in c/localllama@sh.itjust.works

[–] robber@lemmy.ml 2 points 2 weeks ago (2 children)

I use the the proprietary ones from Nvidia, they're at 535 on oldstable IIRC but there are a lot newer ones.

I use 3xRTX2000e Ada. It's a rather new, quite power efficient GPU manufactured by PNY.

As inference engine I use exllamav3 with tabbyAPI. I like it very much because it supports 3-way tensor paralellism, making it a lot faster for me than llamacpp.

Relevance of GPU driver version for inference performance in c/localllama@sh.itjust.works

[–] robber@lemmy.ml 1 points 2 weeks ago

I use the the proprietary ones from Nvidia, they're at 535 on oldstable IIRC but there are a lot newer ones.

10

Relevance of GPU driver version for inference performance (lemmy.ml)

submitted 2 weeks ago by robber@lemmy.ml to c/localllama@sh.itjust.works

11 comments fedilink

Hey everyone! I was just skimming through some inference benchmarks of other people and noticed the driver version is usually mentioned. It made me wonder how relevant this is. My prod server runs Debian 12 so the packaged nvidia drivers are rather old, but I'd prefer not to mess with the drivers if it won't bring a benefit. Does any of you have any experience or did do some testing?

Looking for a movie about a guy who gets their brian transfered to this like construction worker thing. I can't remember the name. in c/asklemmy@lemmy.ml

[–] robber@lemmy.ml 6 points 2 weeks ago

That brian typo really gave me a chuckle. Hope you found the movie you were looking for.

Anyone using a Linux Smarphone? in c/linux@lemmy.ml

[–] robber@lemmy.ml 2 points 3 weeks ago (2 children)

Wikipedia states the UI layer is propriertary, is that true?

Big Brother is watching Switzerland! in c/privacy@lemmy.ml

[–] robber@lemmy.ml 4 points 1 month ago

The country's official app for COVID immunity certificates or whatever they were called was available on F-Droid at the time.

Magistral-Small-2509 by Mistral has been released in c/localllama@sh.itjust.works

[–] robber@lemmy.ml 2 points 1 month ago

Too bad they've only been dropping dense models recently. Also kind of interesting since with Mixtral back in the days they were way ahead of time.

FLX1s is Launched in c/linux@lemmy.ml

[–] robber@lemmy.ml 14 points 1 month ago* (last edited 1 month ago) (1 children)

A review from earlier this year didn't sound too bad.

Edit: as pointed out, the review seems to be about the previous version of the phone.

29

Magistral-Small-2509 by Mistral has been released (huggingface.co)

submitted 1 month ago by robber@lemmy.ml to c/localllama@sh.itjust.works

2 comments fedilink

Qwen3-Next with 80b-a3b parameters is out in c/localllama@sh.itjust.works

[–] robber@lemmy.ml 5 points 1 month ago (1 children)

I'd add that memory bandwidth is still a relevant factor, so the faster the RAM the faster the inference will be. I think this model would be a perfect fit for the Strix Halo or a >= 64GB Apple Silicon machine, when aiming for CPU-only inference. But mind that llamacpp does not yet support the qwen3-next architecture.

36

Qwen3-Next with 80b-a3b parameters is out (huggingface.co)

submitted 1 month ago* (last edited 1 month ago) by robber@lemmy.ml to c/localllama@sh.itjust.works

5 comments fedilink

14

ExLlamaV3 adds tensor parallelism support (github.com)

submitted 2 months ago by robber@lemmy.ml to c/localllama@sh.itjust.works

0 comments fedilink

Title says it - it's been 10 days already but I didn't catch the release. This might be huge for those of us running on multiple GPUs. At least for Gemma3, I was able to double inference speed by using vLLM with tensor parallelism vs. ollama's homegrown parallelism. Support in ExLlamaV3 could additionally allow to pair TP with lower-bit quants. Haven't tested this yet, but I'm looking very much forward to.

15

New, promising MoE model "Hunyuan" by Tencent (huggingface.co)

submitted 3 months ago by robber@lemmy.ml to c/localllama@sh.itjust.works

1 comments fedilink

Tencent recently released a new MoE model with ~80b parameters, 13b of which are active at inference. Seems very promising for people with access to 64 gigs of VRAM.

23

Do you quantize models yourself? (lemmy.ml)

submitted 4 months ago by robber@lemmy.ml to c/localllama@sh.itjust.works

8 comments fedilink

Hey fellow llama enthusiasts! Great to see that not all of lemmy is AI sceptical.

I'm in the process of upgrading my server with a bunch of GPUs. I'm really excited about the new Mistral / Magistral Small 3.2 models and would love to serve them for me and a couple of friends. My research led me to vLLM with which I was able to double inference speed compared to ollama at least for qwen3-32b-awq.

Now sadly, the most common quantization methods (GGUF, EXL, BNB) are either not fully (GGUF) or not at all (EXL) supported in vLLM, or multi-gpu inference thouth tensor parallelism is not supported (BNB). And especially for new models it's hard to find pre-quantized models in different, more broadly supported formats (AWQ, GPTQ).

Does any of you guys face a similar problem? Do you quantize models yourself? Are there any up-to-date guides you would recommend? Or did I completely overlook another, obvious solution?

It feels like when I've researched something yesterday, it's already outdated again today, since the landscape is so rapidly evolving.

Anyways, thank you for reading and sharing your thoughts or experience if you feel like it.

4

Well, that's offending (lemmy.ml)

submitted 5 months ago by robber@lemmy.ml to c/linux@lemmy.ml

2 comments fedilink

Text: Allows you to determine whether to limit CPUID maximum value. Set this to enabled for legacy operating systems such as Linux or Unix.

Found this in the BIOS of a Gigabyte Z97X-UD3H mobo.

62

Any experience with Pangolin? (lemmy.ml)

submitted 5 months ago by robber@lemmy.ml to c/selfhosted@lemmy.world

21 comments fedilink

Hi fellow homelabbers! I hope your day / night is going great.

Just stubled across this self-hosted cloudflare tunnel alternernative called Pangolin.

Does anyone use it for exposing their homelab? It looks awesome, but I've never heard of it before.
Should I be reluctant since it's developed by a US-based company? I mean security-wise. (I'll remove this question if it's too political.)
Does anyone know of alternatives pieces or stacks or software that achieve the same without relying on cloudflare?

Your insights are highly appreciated!

272

More than 140 Kenya Facebook moderators diagnosed with severe PTSD (www.theguardian.com)

submitted 10 months ago by robber@lemmy.ml to c/technology@lemmy.world

21 comments fedilink

127

Don't forget to ... (lemmy.ml)

submitted 10 months ago by robber@lemmy.ml to c/piracy@lemmy.dbzer0.com

14 comments fedilink

13

[Solved] Chaining routers and GUA IPv6 addresses (lemmy.ml)

submitted 1 year ago* (last edited 10 months ago) by robber@lemmy.ml to c/selfhosted@lemmy.world

8 comments fedilink

Hey fellow self-hosting lemmoids

Disclaimer: not at all a network specialist

I'm currently setting up a new home server in a network where I'm given GUA IPv6 addresses in a 64 bit subnet (which means, if I understand correctly, that I can set up many devices in my network that are accessible via a fixed IP to the oustide world). Everything works so far, my services are reachable.

Now my problem is, that I need to use the router provided by my ISP, and it's - big surprise here - crap. The biggest concern for me is that I don't have fine-grained control over firewall rules. I can only open ports in groups (e.g. "Web", "All other ports") and I can only do this network-wide and not for specific IPs.

I'm thinking about getting a second router with a better IPv6 firewall and only use the ISP router as a "modem". Now I'm not sure how things would play out regarding my GUA addresses. Could a potential second router also assign addresses to devices in that globally routable space directly? Or would I need some sort of NAT? I've seen some modern routers with the capability of "pass-through" IPv6 address allocation, but I'm unsure if the firewall of the router would still work in such a configuration.

In IPv4 I used to have a similar setup, where router 1 would just forward all packets for some ports to router 2, which then would decide which device should receive them.

Has any of you experience with a similar setup? And if so, could you even recommend a router?

Many thanks!

Edit: I was able to achieve what I wanted by using OpenWrt and their IPv6 relay mode. Now my ISP router handles all IPv6 addresses directly, but I'm still able to filter the packets using the OpenWrt firewall. For IPv4 I didn't figure out how to, at the same time, use the ISP's DHCP server, so I just went with double NAT. Everything works like a charm. Thank you guys for pointing me in the right direction.

140

USA to be renamed to XXX (lemmy.ml)

submitted 1 year ago* (last edited 1 year ago) by robber@lemmy.ml to c/lemmyshitpost@lemmy.world

22 comments fedilink

Most relevant section translated to english:

If he (Trump) wins the election on November 5, his billionaire supporter Musk will chair the new board. This is to implement a full financial and performance audit of the entire government and make recommendations for drastic reforms.

Source: Swiss state media article