Why is no one man enough to archive Sup Forums?

Why is no one man enough to archive Sup Forums?

some researchers at MIT did, but storing the images exposed them to basically all of the legal dangers a researcher that studies online communities could be exposed to.

i forgot how they handled it, but someone later (or maybe them, i'm not sure) just stored the text. even that might be dicey though.

>Comments are owned by the Poster.
You could sue them

Nothing of value was lost

There must be at least one non public archive of b, and it belongs to the FBI, I think.

Don't forget the sheeky forum

Downloading comments from Sup Forums for the purposes of research fall under "fair use" by *at least* two separate lines of reasoning. to say nothing of the fact that proving copyright claim to a comment on Sup Forums would be (nearly*) impossible.

Why would you want to archive Sup Forums?

Out of all the boards Sup Forums makes the least sense to archive.

The most recent one that archived, fgts.jp just got shutdown by their provider because of failure to delete CP in time, of course the owner sustained other boards, but Sup Forums is the most prominent when it comes to CP posting

You'd need a team of moderators big enough and active 24/7 whose job would be to check every single post at a rate of like 20 posts every second.

There's nothing to stop anyone from archiving anything. Just don't make it publicly accessible, or privately accessible to anyone else for that matter.

Why not just autodelete everything deleted by mods and let them do the heavy work?

There has been a couple of archives doing exactly that but dropping it after just a few weeks.

> archive Sup Forums
Is this a joke? The board has been only porn dumping for over 5 years.
It is dead and buried, it will never yield anything interesting anymore, courtesy of normies. And even for them I don't get the appeal of the board, is it to feel like pseudo-outcasts or something? I would even delete it if it wasn't such a symbolic board, all the people there would probably quit Sup Forums forever never to come back, it's not even a containment board.

not the guy you asked, but since i know the researchers who were at MIT. the gist is that the mods aren't perfect and sometimes stuff just disappears for myriad reasons (the most benign of which being that the poster deleted their post, or the thread fell off the edge of page 15)

I assume it's just 12 year old boys who aren't smart enough to find /gif/.

There isn't anything *worth* archiving on Sup Forums.
If there ever is, it can be archived on a thread-by-thread basis.

and how do you determine what's worth archiving as the threads are passing by?

pretty much every sane approach to archiving threads depends on keeping track of all of them and then doing *something* to determine what to discard. waiting for cues to tell you to download a thread (being conservative rather than greedy) is the worst way to go for mining.

Mining? Pardon me but TOP KEK!

Give us an insight of what your fellow colleagues extracted from that library of Alexandria that Sup Forums is. That is if you know.

I'd agree with you for every board apart from Sup Forums.

Scraping Sup Forums in its entirety for the occasional good thread is like collecting all of the sewer waste to filter for money people accidentally flush down the toilet.
It's generally not worth it.

I liked the chanarchive approach to thread archival, particularly with regards to Sup Forums.
Users could suggest threads for archival, and they were then voted on their merits for a week or so after it 404'd.
At any point in time, the thread could be collected as a zip file for personal archives (say if the thread had value to you but not to the archive in general).
You ended up with an archive where most threads were suggested for archival, but because you could be backtraced if you suggested CP be archived, it was a lot less likely to be archived.

If someone (semi competent, the original had database issues) wanted to make a new version of chanarchive, I'd be all for it.

You could build up a dictionary of shitposting terms, and then analyse each comment for a probability of being a shitpost. If a thread reaches a certain ratio of shitposts/normposts, it gets canned.

>chanarchive
Those where the days.

You have multiple problems there.
First you need to define what shitposting really is with quantifiable data.
You would have a lot false positive/negatives with a ratio and dictionary method.
Even with pattern recognition, shitposting style evolve fast it wouldn't work.

Well, as shitposting evolves, you'd simply need to pick up common key words and phrases shitposting contains that aren't currently flagged as shitposting, and tag them as potential emerging shitposts.

As for what constitutes shitposting, on Sup Forums I don't think there really are false positives, just false negatives.

There is, it's called sheecky forums or something like that. Guy runs a clickbot that pulls posts from various boards, grabs twitter handles and lets it run. He fakes the traffic and makes money. Pretty good idea really.

> As for what constitutes shitposting, on Sup Forums I don't think there really are false positives, just false negatives.
Haha yes, thank you for the laugh.

Literally give me a week.
Working shit out with a cute loli voice actor, then it's time to activate it.