Big-League Archiving

Question

Big-League Archiving

Ethan Myers

I have an idea that could make us a small but meaningful gain in the war on MSM.

Lately it's popular to archive Web pages before sharing the link, using sites like archive.is to avoid giving sites views. Even if you use adblock, getting eyeballs on your site is still valuable. So viewing these sites via archived links is valuable.

What if this could be done automatically?

Let's make a browser add-on that automatically redirects to archived versions of pages.

If we can popularize this add-on, it may make a small difference in uniques for them, but imagine if each org lost a writer or two because of that loss. That would be HUGE.

My proposal is to develop this idea further and submit a formal proposal to Sup Forums. Or, as this is a fairly simple project, we may be able to do it ourselves.

Also soliciting good names for the add-on.

Resources:

archive.is/

github.com/harvard-lil/perma

mementoweb.org/depot/native/archiveis/ (possibly useful API)

March 12, 2017 - 12:41

Other urls found in this thread:

archive.fo/
archive.fo/http://money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html
ghacks.net/2012/02/05/google-cache-browser-lets-you-browse-a-website-in-googles-cache/
googlesystem.blogspot.com/2007/01/browsing-web-using-google-cache.html
github.com/elendirx/web2web
chrome.google.com/webstore/detail/merchant-starver/khkahmomgggciefjdllekdfeielfejjm
youtube.com/watch?v=O1rT6Vi5Ln4
youtube.com/watch?v=vifYelSTlMo
shiftnrg.org
youtube.com/watch?v=9ss_2mdMsiU
youtube.com/watch?v=nsNJjGGR4P8
youtube.com/watch?v=MOhx0AI46Q0
waste.sourceforge.net/
en.wikipedia.org/wiki/WASTE
github.com/merchantstarver/merchant-starver
money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html
twitter.com/AnonBabble

David Martin

bump

March 12, 2017 - 12:51

Wyatt King

We can use javascript to make an userscript, let's make a list of cancerous news sites first.

March 12, 2017 - 12:59

Ian White

Already been trying to get you faggots to work on this with me, I have a working codebase but every time I post it I get called a shill / dataminer / kike

March 12, 2017 - 13:00

Liam Lewis

Click-Off or some shit, seems like a pretty neat idea desu user

March 12, 2017 - 13:01

Noah Green

again, this is already under development and it is called Scrapist

scrapi.st

March 12, 2017 - 13:02

Jaxson Lopez

kike

March 12, 2017 - 13:02

Josiah Sanders

thank you user. I am sorry you were called a shill by people who are too dumb to understand

March 12, 2017 - 13:04

Wyatt Ross

I like this, you b-baka

March 12, 2017 - 13:04

Isaac Garcia

Sure thing.

Really? Never seen. Can you post codebase? Also a basic rundown of how it works?

March 12, 2017 - 13:04

Anthony Collins

Perfect idea for spying and keeping track of user's. How is this not a popular idea?

March 12, 2017 - 13:05

Nicholas Morris

Scrapist: because it scrapes websites, strips advertising, kills clickbait style multiple slidershow shit into one single page, and allows anonymous commenting within the program. You can set up subscriptions to websites and get new articles loaded in from other people (p2p) so it doesn't even give them a single fucking hit. It also can give random metadata when loading the information fresh so they think what hits they do get are 90 year old disabled lesbians or whatever random, advertising metric destroying info you choose.

It's basically adblock + napster + 1995 usenet client aesthetics

and if you don't remember usenet, kill yourself for being too new to the internet

Post your disposable emails if you want invites to the beta

March 12, 2017 - 13:05

Jonathan Ward

it shouldn't require more than 10 lines of code, post a pastebin link.

March 12, 2017 - 13:05

Bentley Lewis

>Really? Never seen. Can you post codebase? Also a basic rundown of how it works?

I was posting the codebase back when I was trying to get people to help. Now that I've had to do the whole thing myself because everyone thought I was a "shill" or a "dataminer", I'm not gonna give out my proprietary network shit (hint: look up the WASTE protocol by nullsoft- it's old but ahead of it's time, and you can build off it) for someone to pull a Notch or Zucc and get e-famous by stealing my work

Don't worry, the thing you guys want is very close to release anyway

March 12, 2017 - 13:07

Anthony Smith

>it shouldn't require more than 10 lines of code, post a pastebin link.

HAHAHAHAHA

italians can't into dev

March 12, 2017 - 13:07

Isaac Rivera

I don't like the name because we need it to go mainstream with the facebook/Wal-mart crowd, not just pollacks.

March 12, 2017 - 13:08

Jordan Peterson

Genius

March 12, 2017 - 13:08

Logan Taylor

> file_get_contents
> Add url to db and where your copy is
> if url they are trying to go to is in your db, it directs to archived version

Simple. Simple. Simple.

March 12, 2017 - 13:10

Parker Baker

look, proxycunt, speaking as someone who regularly goes outside and has a real life, none of the "facebook / walmart crowd" gives a flying fuck about names like that, but they also don't care to use anything more than a basic adblock.

the only people who will use this are people who hate clickbait, marketing websites, ad bullshit, and are willing to use a special browser just to help starve the cancer out.

post something you programmed, to justify that unearned sense of technological sense of superiority you've got going on against other people

March 12, 2017 - 13:10

Brandon James

Cool stuff but
Needs better name before release

March 12, 2017 - 13:11

Gavin Morgan

I use a modified version of the GG one.
It's pretty.

March 12, 2017 - 13:11

Wyatt Cooper

Are you serious? all that needs to be done is a script that looks for a domain name and adds archive.fo/ before the URL, e.g: archive.fo/http://money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html

March 12, 2017 - 13:11

Ian Long

How about "The Mainstream's Meteor"?

March 12, 2017 - 13:12

Luis Howard

Wrong, I've seen liberals telling each other to stop sharing direct links to Breitbart and such

March 12, 2017 - 13:12

Dylan Wood

Looking forward to forking your work user :^)

March 12, 2017 - 13:12

Ethan Long

"TMM" It's actually perfect

March 12, 2017 - 13:13

Matthew Sanchez

You guys obsess over finding the perfect name for shit way too much. Kinda like how you spend days roleplaying over what flag your Kek nation will have and the minutia of how the board of directors will function, because that's all you guys know HOW to do, bullshit around.

For something that's just an archive.is browser? Well that's kinda basic and not what I'm talking about at all.

Also, what are you going to do when they just blank their page and force an update through archive.is? Or block the archive.is web robot?

I get that you kids are young but the basic concept of what you want has been done forever.

Did you have internet before 2013?

ghacks.net/2012/02/05/google-cache-browser-lets-you-browse-a-website-in-googles-cache/

March 12, 2017 - 13:15

Elijah Sullivan

Oh wait, 2007

Were any of you even on the internet then? I didn't think so. Fucking newfags.

googlesystem.blogspot.com/2007/01/browsing-web-using-google-cache.html

Of course, google in their shady bullshit have blocked all of this from working now.

March 12, 2017 - 13:16

Juan Martinez

Dream on, proxylet

March 12, 2017 - 13:17

Ian Collins

>the only people who will use this are people who hate clickbait, marketing websites, ad bullshit, and are willing to use a special browser just to help starve the cancer out.

>Wrong, I've seen liberals telling each other to stop sharing direct links to Breitbart and such

>Breitbart

thanks for confirming my point

March 12, 2017 - 13:18

Christian Collins

Stop being such a massive kike and let the nip OP have his moment.

March 12, 2017 - 13:21

Asher Morgan

Also, blocking archive.is is as simple as blocking four IP addresses.

What do you do when the websites you want to starve can't be loaded into archive.is?

That's why relying on someone else's work is stupid. My method will use distributed p2p loading to acquire content, they can never block my method and they'll also never get either a) a hit with relevant metadata for their advertising pimps or b) a complete full pageload for a full hit

March 12, 2017 - 13:22

Austin Jackson

it's a cunt with a proxy

now if he had anything intelligent to offer, that would be allowable, but what we have here is a cunt with a proxy and nothing intelligent to offer

go fluff an emu before it roots your girlfriend

March 12, 2017 - 13:23

Dominic Young

I'm going back to work on actually producing something that will be valuable, achieve goals, and produce massive amounts of leftcuck butthurt.

Anyone who wants beta invites, post your disposable or secure emails

March 12, 2017 - 13:24

Tyler Thomas

Try phantomJs

Javascript emulated browser. I use it to scrape and crawl sites that require a """real"""" browser and not just fake headers. Plus it grabs javascript which scraping won't.

March 12, 2017 - 13:24

Nathaniel Nelson

This is a great idea, especially since faggots like Milo, Coulter, and Shapiro probably drive 1/3 of the traffic to shit sites like Salon, Buzzfeed, CNN to "point out how absurd they are". I always ask myself... isn't this helping them?

March 12, 2017 - 13:25

Joshua Campbell

You wasted months coding bloat and seem really obsessed legitimising it

The OPs idea need 10 lines of code.

Stop being such a massive faggit and make your own thread.

March 12, 2017 - 13:25

Thomas Bailey

I actually used that for two iterations then found a more stable solution to the scraping arm of the system

March 12, 2017 - 13:25

Christian Cooper

The OP's idea will stop working as soon as anyone blocks the archive.is IPs and then it'll gradually work less and less until he has a frustrated userbase asking him why he can't make archive.is buy more IPs

March 12, 2017 - 13:26

Julian Sullivan

this again

Scrapist uber alles

bye useless fucks with delusions of technological superiority over "normies"

fucking lulz

March 12, 2017 - 13:28

Nathaniel Brooks

It can be stable, the problem with it is the resources and time. What did you move to now?

I have spent years on scraping, archiving, and setting up these types of things but for financial industries to monitor information on themselves throughout the web.

March 12, 2017 - 13:28

Levi White

AWS. Don't set a static IP. reboot the instance every now and then, get a new IP. And most regions have tons of different blocks. Works like a charm.

March 12, 2017 - 13:29

Jack Martinez

>The OP's idea will stop working as soon as anyone blocks the archive.is

Hurr Durr. Why post nonsense for normies, in a thread populated by coders?

Thanks for confirming youre a shill, simply trying to to delegitimise a perfectly thought of plan by OP.

What addclick company do you work for user?

March 12, 2017 - 13:30

Camden Taylor

Bump.

March 12, 2017 - 13:31

Ethan Jones

Do you realize that Archive.is doesn't work off your own IP you stupid fucking shit

March 12, 2017 - 13:32

Jeremiah Brooks

how does it feel to be technologically inept but suffer from delusions of competancy?

ARCHIVE.IS DOESN'T WORK OFF YOUR FUCKING HOME INTERNET CONNECTION

ARCHIVE.IS SCRAPES WEBSITES FROM ONE OF FOUR IPS OWNED BY THE GUY WHO RUNS ARCHIVE.IS

HOW FUCKING STUPID ARE YOU

you're trolling, that's the only explanation

March 12, 2017 - 13:33

Landon Lewis

Calm down sweetie.

*Sips tea*

March 12, 2017 - 13:34

Elijah White

Archive system doesn't need a static IP. you run the site from a different server. You really have no clue how to run a real archiving network do you? I have 30 servers dedicated to archiving. They only do archiving. And has been like that for years.

Then the other public functions run on a separate public network.

March 12, 2017 - 13:34

Caleb Bailey

Fucking FBI shill hijacking chinkanons thread fuck off you stupid faggot half there replies in this thread are you. Turn off your computer and go outside

March 12, 2017 - 13:35

Anthony Collins

Yes he's shilling.

Would it be possible to decentralize this concept? I think it could be possible, but maybe too slow, by using an implementation of Web2Web github.com/elendirx/web2web or similar

March 12, 2017 - 13:36

Angel Miller

now now dearie, no need to get booty bothered schnookums

also archive.is passes your IP to the server in the "X-Forwarded-For" header

March 12, 2017 - 13:36

Robert Reyes

That's fine if you're talking about setting up your own archiving system, but you were referring to using his concept of piggybacking off of archive.is. Maybe you did so by accident, go back and re-read what you said and what you were replying to.

Have you even built your own computer? Kill yourself non-STEMer, grownups are talking

post a disposable email and I'll send an invite.

I backed away from being able to parse javascript because it could introduce vulns and I'm focusing on stripping away information to create a basic, pre-2000 era readable webpage, not on parsing garbage that no one needs.

March 12, 2017 - 13:36

Gavin Miller

i already made an extension that does this. im too lazy/dumb to update it to be better though.

chrome.google.com/webstore/detail/merchant-starver/khkahmomgggciefjdllekdfeielfejjm

March 12, 2017 - 13:38

Connor Williams

>someone doesn't pamper my bottom about my ideas
>they are a shill

Holy fuck. I thought that the dunning-kruger effect was a meme spouted by redditards but here you are, living proof that the dumber someone is, the smarter they think you are.

>also archive.is passes your IP to the server in the "X-Forwarded-For" header

What do you think that means? Do you think that means that the website that doesn't want to be archived by archive.is can't just block the four IPs used by archive.is?

March 12, 2017 - 13:38

Adrian Miller

Yeah, I wouldn't piggy back because it would be hell to organize and make it work better for you.

[email protected]

Heading out, email me anything for anyone that wants help or testing this type of thing.

March 12, 2017 - 13:39

Bentley Nelson

>Would it be possible to decentralize this concept?

Already doing it and it'll be better than you can do.

March 12, 2017 - 13:39

Jayden Scott

obviously not too hard. you could have a list of domains that automatically get archived.

March 12, 2017 - 13:40

Gavin Wood

This guy and me were literally the only people with any skills or ability in this thread, and now I'm going back to work too

have fun fags

March 12, 2017 - 13:40

Parker Anderson

is it past babies bedtime, honeybunch?

my remark wasn't meant to disagree with you, but to add additional info about how it works

there there, want mummy to put some talcum on?

March 12, 2017 - 13:41

Daniel Young

>>the only people who will use this are people who hate clickbait, marketing websites, ad bullshit, and are willing to use a special browser just to help starve the cancer out.
So, everyone.

March 12, 2017 - 13:42

Nathaniel Bailey

>my remark wasn't meant to disagree with you, but to add additional info about how it works
>there there, want mummy to put some talcum on?

Low effort trolling, 0/10

You clearly were trying to obfuscate the issue by implying that your IP is not merely shown to the website being scraped, but used for the connection

I don't know why I can't stop feeding this troll tho, it's like those ducks in Vietnam where you keep tossing them bread and then eventually one of the alligators gets them

March 12, 2017 - 13:43

Gabriel Ross

archive.is is fishy though.

March 12, 2017 - 13:43

Jaxon Young

Thank you for the encouraging words. Post your disposable email to get a Scrapist invite

March 12, 2017 - 13:43

Brayden Flores

And relying on other people's infrastructure is a dumpster fire of a plan

March 12, 2017 - 13:44

Juan Howard

>tfw made low-quality chrome extension of OP's post and get zero (You)'s

fuck this place

March 12, 2017 - 13:44

Jordan Flores

not at all, homo

I was trying to suggest that any archiving system should take note that if it runs through archive.is, your actual IP will be sent to the target website when doing and archive. so a better system would not do it that way

March 12, 2017 - 13:45

Austin James

You should have told people about it as a method to DIG DIG DIG on pizzagate and moloch websites and it would have hit the bump limit over and over

(DIG DIG DIG is a new way of saying look at the tiny shred of information that you can find by using search engines like CIA owned google who index a little bit of the internet and going through the publicly available results to find "CLUES")

March 12, 2017 - 13:46

Benjamin Edwards

I missed that

add a pic in future

March 12, 2017 - 13:47

Matthew Wilson

Listen, you double nigger: You jumped in on this train of dumbfuck random neural collisions masquerading as 'thought':

The australian literally does not understand how Archive.is works. He thinks it goes through his home dreamtime petrol pipe or something.

You entered that line of discussion and defended his moronic position. Period.

You're probably posting from a fucking tablet your parents got you for your 15th birthday last month.

March 12, 2017 - 13:48

Juan Moore

Don't forget it sounds like rapist. That is the most important part.

March 12, 2017 - 13:49

Aiden Hill

Good work, unlike the OP you actually can make things

March 12, 2017 - 13:49

Nicholas Reed

>double nigger
>double!!?

zounds good sir, I are naught but 1 and 3/5ths nigger

you are a cove! and I would warrant a blackguard also, pish and tosh to you

March 12, 2017 - 13:50

Ryder Gomez

>You will be able to use Scrapist to read advertising free, publicly user commentable, comfy reading template adjusted 'articles' by feminist bloggers who insist that they are literally being Raped by Scrapist

2017 confirmed comfiest year

March 12, 2017 - 13:51

Isaiah Williams

youtube.com/watch?v=O1rT6Vi5Ln4

March 12, 2017 - 13:53

Grayson Gomez

youtube.com/watch?v=vifYelSTlMo

March 12, 2017 - 13:55

Austin Myers

>chrome
is there a version for FF?

March 12, 2017 - 13:55

Luke Brooks

Decentralize a database and update via p2p checking against a ledger for updates? Of course it's possible. That's what cryptocoin is.

shiftnrg.org

March 12, 2017 - 13:56

Jose Cook

You have no taste for the classics.

youtube.com/watch?v=9ss_2mdMsiU

March 12, 2017 - 13:57

Samuel Jackson

guys are thinking about this too hard.
we do need an archive site with a public API

regex identifies any FQDN
send request to archive to archive api
replace text on page with an archive link.

someone make a git repo.

March 12, 2017 - 13:58

Ethan Nguyen

I would use this. It stops sites from auto-refreshing and loads quicker.

March 12, 2017 - 13:59

Carter Peterson

youtube.com/watch?v=nsNJjGGR4P8

March 12, 2017 - 14:01

Brayden Lopez

[email protected]

send your virus, CIA nigger

March 12, 2017 - 14:02

Ethan Powell

Stop datamining for your kike brothers you shill.

March 12, 2017 - 14:02

Nathan Torres

witnessed dobs

March 12, 2017 - 14:02

Ryder Gray

you just do some version of a distributed hash table.

duplicates will occur relatively infrequently, and can be reconciled relatively easily.

the difference between this and cryptocoin is if we intentionally do p2p, there is no equivilent of "blockchain server" there has to be an expectation of clients sometimes acting asynchronously. "nearline communication"

it is not 100% neccisary to eliminate duplicates, it would create a minor increased load to the archive site compared.

March 12, 2017 - 14:04

Jeremiah Perez

youtube.com/watch?v=MOhx0AI46Q0

March 12, 2017 - 14:06

Joshua Lee

gpl it
put it on git

March 12, 2017 - 14:08

Adrian Russell

Fucking shill. Gtfo.

March 12, 2017 - 14:08

Nathaniel Barnes

waste.sourceforge.net/

en.wikipedia.org/wiki/WASTE

Fun fact: I've been experimenting with this protocol since 2003, there's a reason they tried to quash it so fast. Thank god they failed.

WASTE is the grandfather of all p2p crypto today. Bitcoin protocol is a modification to WASTE. And it's been out there, in the wild, free to play with, all along. They did a good job keeping it from being widely known but they were never able to get rid of it.

If you don't own a TPW P4-15NB then enjoy having all of your crypto hardware backdoored by the deep state.

March 12, 2017 - 14:11

Hudson Thomas

>1 post by this ID

March 12, 2017 - 14:11

Anthony Lewis

>kikeblocker
>kikeclicker
>fakenewsblocker
>shekelblocker
>shekeldodger
>FUmsm
>clickde-baiter
>baitclicker

March 12, 2017 - 14:14

Ryder Barnes

This is a good thread

March 12, 2017 - 14:17

Eli Parker

Such is the cost of trying to organize something on a site that is in perpetual chaos.

Sorry you went through frustrations regarding it, you people that can program have my respect. Achieving our cause will be greater than any hindrance we face along the way..

March 12, 2017 - 14:21

Lincoln Russell

>putting it on git so everyone can look at my bad code

Okay. Not gonna put this on my github account though; last thing i want to be known for is creating an (((anti-semitic))) extension

github.com/merchantstarver/merchant-starver

happy viewing

March 12, 2017 - 14:22

Kevin Taylor

>fake news blocker
thats actually what i was going to name mine kek
friend thought of "merchant starver" and i thought it was funnier

March 12, 2017 - 14:25

Jason King

Finding a palatable name for a large chunk of the population - in particular the moderates and swing voters - is incredibly important. Using a name that literally includes the word "rapist" or something that can easily be smeared as an add-on for white supremacist nazis (hitl.er or something) will defeat the entire purpose of starving these sites of unique views if we make the app so toxic that no one wants to touch it or recommend it to others.

There are numerous examples of companies failing because of choosing a poor name, don't let this be one of them.

March 12, 2017 - 14:28

Jose Roberts

>muh pr

You guys were worthless during gamergate and you're worthless now

March 12, 2017 - 14:29

Liam Jackson

Looks nice, though it doesn't handle sub-domains, or does it? for links like: money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html

March 12, 2017 - 14:33

Zachary Cox

>WASTE
this is probably the right idea. however it might be overkill. you dont need to physically share much data, really just data structure and some hashes.

can we establish some baseline requirements
- some form of pki crypto
- some nosql data store (something that can manage and search a hash table) with clustering
- archive site(s) with a public api (or simple CRUD / REST interface)
- something like uBlock to regex match any FQDN and replace it in page with the archive link
- peer orchestration framework

March 12, 2017 - 14:33

Jason Wilson

>github.com/merchantstarver/merchant-starver

is this really all a chrome addon is?
I've looked into firefox addons and my eyes crossed.

March 12, 2017 - 14:35

1 2 ... 10 Next

Big-League Archiving

Last threads