As an imageboard user and data hoarder, I save all the content I find interesting, amusing, or arousing. You know this very well since you are probably the same way.
I have several thousands of images sitting on my hard drive, most of which have been saved by hand. This is content I dedicate time for searching, so it is pretty special. I have several thousands more images but those are scrapped, so I have no troubles organizing them. I have postponed the sorting of these images for a while and hand-sorting them is just painful and time consuming. At the moment, I have about 5000 images left to organise in around 200 thematic folders. I thought about deploying simple machine-learning based solutions using python and tensorflow, but they are unlikely to fit the granularity level I go for, since vastly different images can fit together in the same folder (think of memes, as an example). Also, organizing the dataset by hand for learning would be pretty much what I am trying to accomplish right now.
Do you have a similar collection ? Do you have a set-up to manage it ?
You can also add downloaded tag database dump files (like from a *booru) in almost the same place, but the PTR is the big one.
Carson Davis
hydrus network is so freaking gay
I keep a complex file structure of anime images sorted by character, pairings, show ect ect and then elo ranked using tournaments and manual comparisons and some fancy algorymths and then rip tags from boorus using a program i wrote and embed the tags as metedata
coming soon: ranking individual tags using microsoft TRUESKILL and a gui to put it all together and btfo hydrus once and for all
btw hydrus copies all of your images into its own folder, freaking LAME
Jacob Parker
You have autism
Blake Ward
last time I used hydrus, it lagged horribly with large image collections. was this ever fixed?
Bentley Carter
>btw hydrus copies all of your images into its own folder, freaking LAME Wait, it does? That's not gonna work out when I have around 40gb of images saved.
Kevin Morales
anyone who considers using hydrus network is reddit and also im not autistic
Alexander Rogers
Hydrus is shit, yes, but you have autism.
Yes, the creator somehow thought it a bad idea to just symlink and create a db of hashes based on that. Find a download for Google Picasa instead.
Kevin Hall
It doesn't really lag for me, but of course querying the DB and loading thumbnails gets slower if you have a bunch of million files and search for a bunch of tags.
Aiden Lopez
It can move them into its own folder. Doesn't require any more storage space then.
But it will manage files and folders hash-based, yes. It doesn't do another mapping between hash and file storage location.
Blake Jackson
Side question ... is there a self-hosted booru that clones the tags from the public ones?
Christopher Campbell
> Yes, the creator somehow thought it a bad idea to just symlink and create a db of hashes based on that. It is a problem, symlinks / NTFS junctions like other things aren't going to work on all filesystems and security policies that are still in use. Symlinks are generally used sparingly and only by entities that fully do sysadmin stuff.
if the dev wasn't a freaking retard he would put the metadata inside the images and kept a db of hashes
Charles Nguyen
It's freaking retarded to alter all the images, it will change their hashes.
And actually not all formats supported will just take some metadata field.
And yes, this IS using a db of hashes, it's just not also doing a mapping from hashes to file paths that needs an extra lookup per file.
Aaron Wright
Neat. Gonna have a booru with only porn that I like.
Can it auto-download artist tags too?
Nolan Flores
save the new hash as metadeta in the image genius bwahaha
Cameron Ramirez
>Use tool once >Changes hash of all your images >Now locked in to that vendor
Liam Hill
>new hash >PTR becomes pointless since the hashes are different
Jose Peterson
If you let Hydrus download files from a *booru, it'll basically grab all tags if you tell it to. [There is sometimes slightly more fine-grained control, but generally you'll grab all tags.]
The PTR and DB dumps from possibly other boorus can then also augment with their tags.
Elijah Ramirez
a large metadeta section that just iterates looking for hash collisions for the original image hash
Isaiah Miller
Yea, and everyone else will hate your users when they upload their files from their filesystem directly since now easy duplicate elimination no longer works.
Basically, even if you want to hack the feature into your fork of hydrus, please add another DB to have that slower hash->file paths indirection. Don't fuck with the images if you don't have to.
David Myers
ohhh noo the ceo of whatever non-encrypted / autistic indie image host will be mad because I used .0000001% more resources
Adam James
Not entirely sure of the file structure for all image types, but isnt it possible to generate a hash using only data from the non-metadata section of the file? So that you can rename the file, change metadata, tags, etc but the hash remains the same so long as the image remains the same?
Matthew Reyes
the hash will be calculated using the entire file by whatever other generic service that sees the file; I think it's technically possible to generate a deterministic hash without the metadeta though; because normal jpeg metedata is all at the end of the file by itself
Evan White
Right, so really... that's how it should be done. I'm sure such a system would have to detect the way the drive is formatted, and the file type, to properly hash using only "non-metadata".
Hunter Phillips
If they can't strip the metadata easily, they'll just disable the file uploads, and either way, you'd be a faggot like the sites who watermark everything.
Even then, you can't add your metadata to all file types that Hydrus supports. But feel free to fork if you really need to try your approach, it's an open sauce project after all.
Jordan Brown
Enjoy doing it for all 16+ container formats apart from JPEG.
And you're generally just asking to wreck performance anyhow.
Julian Martin
where do these fantasies about 'disabling uploading of unique files' come from lol
im not forking garbage pythongui garbage shit; i already have my own perfect system built
lol calculate the hash before modifying the file and save that hash as a metadeta nerd all these problems are already solved there are already cpp libraries that can metadeta just about any image format that supports metadeta; otherwise just convert the image
every booru api supports searching for hash directly anyway, you dont even have to do anything funny
Andrew Price
the best way to sort this kind of content is chronologically like how your memories are organised. separate them into folders order by month. this way there should be a manageable amount of folders each with a manageable amount of images and if you have a decent memory you should be able to recall roughly what's in each folder
Robert Sullivan
worst post itt
Evan Foster
t. someone with a shit memory.
Kayden Green
> i already have my own perfect system built Uh ... good job I guess?
I obviously don't even see why I should believe that it's anywhere near perfect. You generally seem to make everything slower by requiring a lot of filesystem accesses, and be interested in JPEG only.
> every booru api supports searching for hash directly anyway, you dont even have to do anything funny No shit, because *boorus and various CDN and big data things tend store and retrieve files by hash "/68/c4/sample_68c416bf307b595173121aad55d829fd.jpg" on gelbooru. Exactly because it's the superior solution.
> otherwise just convert the image Converting one lossy image format into another is usually such a great idea.
Never mind all the fun you can have converting .swf, .pdf and more.
Isaac Barnes
You don't even need the folders, your file manager can sort files chronologically for you.
But really, this doesn't work very well unless you're not looking at that many files, or have a really fucking good memory.
Andrew Thompson
lol why are my anime images going to .pdfs lmao
Thomas Ortiz
So does hydrous in effect know how to parse through Sup Forums images simply based off the file number and no additional tags? I say this because I have over a hundred twenty thousand images save in a single four terabyte SATA drive that are completely disorganized outside of being listed by the sequential order as saved from
Ryder Reyes
> So does hydrous in effect know how to parse through Sup Forums images simply based off the file number and no additional tags? It will generate checksums from the files when you import the files.
Then it will match them with tag databases you enabled (from *boorus, the PTR, whatever) and you should have quite a lot of images that now have tags.
It'll also generate a second set of "checksums" (perceptual hashes) to enable finding duplicate files for various supported file types, but they'll not be used in the file names.
> over a hundred twenty thousand images Seems considerably less than I'd typically expect on a 4TB drive, are they all high resolution or something?
Connor Torres
That's just really my memes and interesting images folder from Sup Forums the SATA drive just so happens to be 4 terabytes in size.
So the images have to be p*** is that really the caveat in order to make it work? What about all of the the images I have that are strictly not p***and obviously wouldn't show up on you know like most archives with extensive tags. Also I'm using a transcriber have no f****** idea why the phone keeps censoring my language when I curse but I don't think it well sensor specific terms like gook.
Jacob Cook
lol
Anthony Jones
> That's just really my memes and interesting images folder from Sup Forums the SATA drive just so happens to be 4 terabytes in size. Odd. Most of these should be like 200kb to 1MB or something, so I'd certainly more expect like 120GB than 4TB.
I'm not actually sure what that p*** censored word is. Pasta? Porn? Penises? But anyhow, it's pretty funny what exactly your transcriber censors.
I also can't really tell you if there is particularly *good* tag coverage for your images. But of course you can add your own tags to files. [Whether it's on a booru or on the PTR, most tags were ultimately added manually by someone.]
Ian Gonzalez
It is limited by your hard drive of course, but you can try running db maintenance or (preferably) moving hydrus to your ssd (you do have an ssd, right user?)
Ian Perry
I tell you what I do OP. every few months I perform what I call a "clean slate" where I move all pictures from my phone, tablet, and laptop onto dedicated flash drives. the drives then go into a box along with graveyard of drives from previous clean slates.
You would think that you would need to explore these often, but often I find they are forgotten about quite quickly as your fresh devices begin the cycle again with new media just as rapidly
Grayson Hernandez
I use Save Image/Link in Folder and save images into set categories. The images go into a unsorted folder within their category so that I can place it into its specific folder later.