this post was submitted on 30 Jul 2023
        
      
      221 points (100.0% liked)
      Technology
    40580 readers
  
      
      371 users here now
      A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.
Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.
Subcommunities on Beehaw:
This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.
        founded 3 years ago
      
      MODERATORS
      
    you are viewing a single comment's thread
view the rest of the comments
    view the rest of the comments
Generally speaking, the way training works is this:
You put together a folder of pictures, all the same size. It would've been 1024x1024 in this case. Other models have used 768z768 or 512x512. For every picture, you also have a text file with a description.
The training software takes a picture, slices it into squares, generates a square the same size of random noise, then trains on how to change that noise into that square. It associates that training with tokens from the description that went with that picture. And it keeps doing this.
Then later, when someone types a prompt into the software, it tokenizes it, generates more random noise, and uses the denoising methods associated with the tokens you typed in. The pictures in the folder aren't actually kept by it anywhere.
From the side of the person doing the training, it's just put together the pictures and descriptions, set some settings, and let the training software do its work, though.
(No money involved in this one. One person trained it and plopped it on a website where people can download loras for free...)