In the last weeks Lemmy has seen a lot of growth, with thousands of new users. To welcome them we are holding this AMA to answer questions from the community. You can ask about the beginnings of Lemmy, how we see the future of Lemmy, our long-term goals, what makes Lemmy different from Reddit, about internet and social media in general, as well as personal questions.
We'd also like to hear your overall feedback on Lemmy: What are its greatest strengths and weaknesses? How would you improve it? What's something you wish it had? What can our community do to ensure that we keep pulling users away from US tech companies, and into the fediverse?
Lemmy and Reddit may look similar at first glance, but there is a major difference. While Reddit is a corporation with thousands of employees and billionaire investors, Lemmy is nothing but an open source project run by volunteers. It was started in 2019 by @dessalines and @nutomic, turning into a fulltime job since 2020. For our income we are dependent on your donations, so please contribute if you can. We'd like to be able to add more full-time contributors to our co-op.
We will start answering questions from tomorrow (Wednesday). Besides @dessalines and @nutomic, other Lemmy contributors may also chime in to answer questions:
Here are our previous AMAs for those interested.
            
           
          
What are your thoughts on blocking AI scraper access? Any attempts to improve that on the side of Lemmy? Basic things like allowing to customize the robots.txt easily would already help.
I also recently tried this new AI block tool called Anubis with Lemmy, but for some reason it fails with Lemmy-ui. Might be interesting to investigate further.
Anyone that wants to scrape Lemmy would have an easier time setting up their own server, federating with everyone, and reading straight from their DB. No web scraping required. Though, web scraping defenses would be useful against general web scrapers/crawlers.
That would require the authors of these AI scrapers to actually give a f*ck. The problem is that they don't, and just scrape what ever they can find repeatatly almost like a ddos attack on the open web.
Yup, same as they could clone git repos in one shot, but they instead crawl every single page.
You can load a different robots.txt in your nginx config, something like this:
Additionally 1.0 will change the "private instance" to work with federation enabled (see https://github.com/LemmyNet/lemmy/pull/5530). Then only logged-in users will see content, while AI scrapers wont see anything except the login page.
I've previously worked in anti-scraping. There is a negative 0% chance the Lemmy devs have the resources to effectively do this without tanking the server for everyone else.
I just set up Anubis today. Specifically I'm only testing it for Lemmy-ui, and it seems to work fine.
It looks like the distributed waves that keep bringing the service down hit exclusively our lemmy-ui subdomain, so maybe non-SSR photon is also a good defense, heh.
Hmm, that is odd. I guess I need to double check my Nginx config for lemmy-ui then. You have your setup documented somewhere?
Edit: ah, you run Photon as the main UI and lemmy-ui somewhere else? I think specifically the split between frontend and backend on the root domain somehow makes Anubis fail to set the correct cookie.
I don't think it should be a problem, but I'm not that sure either. Lemmy.fedi.zutto.fi also runs it and that's just a normal lemmy-ui installation. I think Zutto simply forwarded all traffic to Anubis and then fixed federation. There was some discussion and config shared in sopuli's finnish matrix room.