Been working on compiling large datasets of characters derived from anime screencaps for the purpose of training as LoRAs to be used with Stable Diffusion. I’ll typically be working with usually about 10,000 images (and up to 80,000 images in some cases) that I will need to manually crop to focus on the intended character. That said, I do use a simple cosine similarity program to remove near-duplicate images along with WD1.4 tagging to divide images into their own character-specific datasets based on appearance, but I may still have to manually crop upwards of 1,000 images. It’s not impossible, but by no means a valuable use of time when there’s likely a way to significantly reduce the menial work.
I’ve seen some solutions with FiftyOne, but I’ve got no idea how to utilize it myself – are there any publicly available solutions anyone can recommend?
submitted by /u/jnslater
[link] [comments]