Anon10/23/25, 19:01No.8753253
If only you knew how reverse searching works.
If you want an image to be possible to reverse searchable, it's simple, just PHash, only a tiny string that takes no space.
But if you want to compare against cropped data? Now you gotta store detailed features using something like ORB or even worse, a CNN which takes a fuck ton of compute and runs at like few images per second on consumer GPUs.
But even with something like that, if you do extreme crops like below 20% of original image, it's not going to work unless you store so much data that it almost takes as much space as the original images.
With simple PHash, literally anyone can make a system themselves as each image would just be a tiny hash that takes a few bytes, depending on accuracy.
Rough estimate at 36, 64, 128 and 576 bit PHash, if you had 1TB of storage, you could store 125 billion, 83.33 billion, 41.66 billion and 10.41 billion files respectively.
Though that is only for the hash itself, if you want to store what file is tied to, expect it to be reduced by 50-75% at 36, 50-ish% at 64, maybe 25% at 128 and 10% at 576 bit (rough estimates)
But for handling crops with something like ORB, the storage requirements explode and can take up 10% the size of the original image. More or less depending on accuracy.
Considering how much data there is on the internet, I can understand why they are severely lacking or is unreliable with crops.
And to make it even worse, all data is increasing at an insane rate. Especially now that AI is a thing.
Nothing is stopping you from making your own search for Pixiv though. Even with ORB, if you're fine with less accurate crops, like having 50% original image, then you can make it go pretty low. Even more so if using 7z on the database afterwards.
You don't even have to store the images permanently, can just download, add to database, delete.
I should probably just do something myself, since I already have 10TB of pixiv, all of E621 and 3TB of Sankaku lmao.