Let's say I want to start an image hosting service

Let's say I want to start an image hosting service.

Should I go for SSD or HDD option?

Other urls found in this thread:

hastebin.com/awakekubup.avrasm
twitter.com/SFWRedditGifs

It don't matter
None of this matters

Use the cloud

explain

hdd if you need the space, popular images will be cached in ram, so will be served quickly
if the ssd is big enough, then it'd be better

The problem I have here is that I don't know how many people will use my service and I'm not sure how to predict it. Therefore I have no idea how much space do I really need.

I'm considering buying 20 GB SSD vs 50 GB HDD, but not sure if the first option is enough.

I can only estimate that average user will send ~60 kb files (it's a very specific image hosting service) and might do it around 5-10 times.

I have always wondering if image hosting sites hash their images to prevent hosting duplicate files and save space.

It might be a little more overhead but the disk space saved would probably be worth it.

i'd probably go for the hdd, consider an ssd later if it turns out to be too slow

commonly-accessed images will be cached by your OS in memory, so will be served without fetching from disk

unless you have a lot of visitors at any instance, it's better to have more space than worry about IO latency

you can always change it later on if need be

Op ll host porn?

Use SSD as cheap "ramdisk" (store most popular images) and HDD as major backbone.

This solves the issue of space and speed.

even Sup Forums does this

there's many ways to do it, like-
- purely frontend, you have a list of images and hashes in a database, and only store unique files, pointing identical images to the same file
- filesystem-level, some filesystems like btrfs and zfs support deduplicating entire volumes, seperate to the userspace (if you have two identical files on disk, they only use the space of one copy)
- something as simple as a shell script that periodically hashes files and replaces copies with symlinks

i imagine op is getting this choice from a VPS vendor, he might not have the option to pick both

though yes, if you're physically making a server, using an ssd as a cache to a hdd/raid backend is a smart way to make the most of both technologies

OP here, you are correct. It's VPS SSD vs VPS HDD

The way i would do it is have a temp upload directory and have a program that hashes the files in that directory and delete duplicates from it. If the hash is not found i would have the cgi program redirect the users browser to the existing image url seemlessly.

That would be much cleaner than fooling with symlinks.

Or even better have js hash the file on the client side then the server only has to verify the hash. If it exists no upload is needed.

>Or even better have js hash the file on the client side then the server only has to verify the hash. If it exists no upload is needed.
some services do this as well, it's an advantage for both parties bandwidth-wise, but uses more cpu/memory resources on the client browser
this would also imply you're using a database mapping 'uploads' to stored files, unless you outright refuse identical files (which might not be a good idea, there's several cases where one might want to the same file with different metadata, such as upload date, filename, ID, or if it's in a particular collection, if that's to be a feature)

to give you an example of what i'm thinking with a database;
hastebin.com/awakekubup.avrasm
(Sup Forums spam filter)

(oh, you should probably use something better than md5 if your server has a good cpu)

Wouldn't altering the metadata change the hash?

This, use HDD. It's better to have a slightly slower service that works than have your blazing fast service spit out "no space" errors.

depends on which metadata you're talking about
metadata relating to the filesystem or your web service won't affect the file contents (and therefore, the file hash)
only metadata relating to the file format itself will, such as gif comments, jpeg EXIF, mkv titles, etc, etc