Big-League Archiving

I have an idea that could make us a small but meaningful gain in the war on MSM.

Lately it's popular to archive Web pages before sharing the link, using sites like archive.is to avoid giving sites views. Even if you use adblock, getting eyeballs on your site is still valuable. So viewing these sites via archived links is valuable.

What if this could be done automatically?

Let's make a browser add-on that automatically redirects to archived versions of pages.

If we can popularize this add-on, it may make a small difference in uniques for them, but imagine if each org lost a writer or two because of that loss. That would be HUGE.

My proposal is to develop this idea further and submit a formal proposal to Sup Forums. Or, as this is a fairly simple project, we may be able to do it ourselves.

Also soliciting good names for the add-on.

Resources:

archive.is/

github.com/harvard-lil/perma

mementoweb.org/depot/native/archiveis/ (possibly useful API)

Other urls found in this thread:

archive.fo/
archive.fo/http://money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html
ghacks.net/2012/02/05/google-cache-browser-lets-you-browse-a-website-in-googles-cache/
googlesystem.blogspot.com/2007/01/browsing-web-using-google-cache.html
github.com/elendirx/web2web
chrome.google.com/webstore/detail/merchant-starver/khkahmomgggciefjdllekdfeielfejjm
youtube.com/watch?v=O1rT6Vi5Ln4
youtube.com/watch?v=vifYelSTlMo
shiftnrg.org
youtube.com/watch?v=9ss_2mdMsiU
youtube.com/watch?v=nsNJjGGR4P8
youtube.com/watch?v=MOhx0AI46Q0
waste.sourceforge.net/
en.wikipedia.org/wiki/WASTE
github.com/merchantstarver/merchant-starver
money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html
twitter.com/AnonBabble

bump

We can use javascript to make an userscript, let's make a list of cancerous news sites first.

Already been trying to get you faggots to work on this with me, I have a working codebase but every time I post it I get called a shill / dataminer / kike

Click-Off or some shit, seems like a pretty neat idea desu user

again, this is already under development and it is called Scrapist

scrapi.st

kike

thank you user. I am sorry you were called a shill by people who are too dumb to understand

I like this, you b-baka

Sure thing.

Really? Never seen. Can you post codebase? Also a basic rundown of how it works?

Perfect idea for spying and keeping track of user's. How is this not a popular idea?

Scrapist: because it scrapes websites, strips advertising, kills clickbait style multiple slidershow shit into one single page, and allows anonymous commenting within the program. You can set up subscriptions to websites and get new articles loaded in from other people (p2p) so it doesn't even give them a single fucking hit. It also can give random metadata when loading the information fresh so they think what hits they do get are 90 year old disabled lesbians or whatever random, advertising metric destroying info you choose.

It's basically adblock + napster + 1995 usenet client aesthetics

and if you don't remember usenet, kill yourself for being too new to the internet

Post your disposable emails if you want invites to the beta

it shouldn't require more than 10 lines of code, post a pastebin link.

>Really? Never seen. Can you post codebase? Also a basic rundown of how it works?

I was posting the codebase back when I was trying to get people to help. Now that I've had to do the whole thing myself because everyone thought I was a "shill" or a "dataminer", I'm not gonna give out my proprietary network shit (hint: look up the WASTE protocol by nullsoft- it's old but ahead of it's time, and you can build off it) for someone to pull a Notch or Zucc and get e-famous by stealing my work

Don't worry, the thing you guys want is very close to release anyway

>it shouldn't require more than 10 lines of code, post a pastebin link.

HAHAHAHAHA

italians can't into dev

I don't like the name because we need it to go mainstream with the facebook/Wal-mart crowd, not just pollacks.

Genius

> file_get_contents
> Add url to db and where your copy is
> if url they are trying to go to is in your db, it directs to archived version

Simple. Simple. Simple.

look, proxycunt, speaking as someone who regularly goes outside and has a real life, none of the "facebook / walmart crowd" gives a flying fuck about names like that, but they also don't care to use anything more than a basic adblock.

the only people who will use this are people who hate clickbait, marketing websites, ad bullshit, and are willing to use a special browser just to help starve the cancer out.

post something you programmed, to justify that unearned sense of technological sense of superiority you've got going on against other people

Cool stuff but
Needs better name before release

I use a modified version of the GG one.
It's pretty.

Are you serious? all that needs to be done is a script that looks for a domain name and adds archive.fo/ before the URL, e.g: archive.fo/http://money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html

How about "The Mainstream's Meteor"?

Wrong, I've seen liberals telling each other to stop sharing direct links to Breitbart and such

Looking forward to forking your work user :^)

"TMM" It's actually perfect

You guys obsess over finding the perfect name for shit way too much. Kinda like how you spend days roleplaying over what flag your Kek nation will have and the minutia of how the board of directors will function, because that's all you guys know HOW to do, bullshit around.

For something that's just an archive.is browser? Well that's kinda basic and not what I'm talking about at all.

Also, what are you going to do when they just blank their page and force an update through archive.is? Or block the archive.is web robot?


I get that you kids are young but the basic concept of what you want has been done forever.

Did you have internet before 2013?

ghacks.net/2012/02/05/google-cache-browser-lets-you-browse-a-website-in-googles-cache/

Oh wait, 2007

Were any of you even on the internet then? I didn't think so. Fucking newfags.

googlesystem.blogspot.com/2007/01/browsing-web-using-google-cache.html

Of course, google in their shady bullshit have blocked all of this from working now.

Dream on, proxylet

>the only people who will use this are people who hate clickbait, marketing websites, ad bullshit, and are willing to use a special browser just to help starve the cancer out.

>Wrong, I've seen liberals telling each other to stop sharing direct links to Breitbart and such

>Breitbart

thanks for confirming my point

Stop being such a massive kike and let the nip OP have his moment.

Also, blocking archive.is is as simple as blocking four IP addresses.

What do you do when the websites you want to starve can't be loaded into archive.is?

That's why relying on someone else's work is stupid. My method will use distributed p2p loading to acquire content, they can never block my method and they'll also never get either a) a hit with relevant metadata for their advertising pimps or b) a complete full pageload for a full hit

it's a cunt with a proxy

now if he had anything intelligent to offer, that would be allowable, but what we have here is a cunt with a proxy and nothing intelligent to offer

go fluff an emu before it roots your girlfriend

I'm going back to work on actually producing something that will be valuable, achieve goals, and produce massive amounts of leftcuck butthurt.

Anyone who wants beta invites, post your disposable or secure emails

Try phantomJs

Javascript emulated browser. I use it to scrape and crawl sites that require a """real"""" browser and not just fake headers. Plus it grabs javascript which scraping won't.

This is a great idea, especially since faggots like Milo, Coulter, and Shapiro probably drive 1/3 of the traffic to shit sites like Salon, Buzzfeed, CNN to "point out how absurd they are". I always ask myself... isn't this helping them?

You wasted months coding bloat and seem really obsessed legitimising it

The OPs idea need 10 lines of code.

Stop being such a massive faggit and make your own thread.

I actually used that for two iterations then found a more stable solution to the scraping arm of the system

The OP's idea will stop working as soon as anyone blocks the archive.is IPs and then it'll gradually work less and less until he has a frustrated userbase asking him why he can't make archive.is buy more IPs

this again

Scrapist uber alles

bye useless fucks with delusions of technological superiority over "normies"

fucking lulz

It can be stable, the problem with it is the resources and time. What did you move to now?

I have spent years on scraping, archiving, and setting up these types of things but for financial industries to monitor information on themselves throughout the web.

AWS. Don't set a static IP. reboot the instance every now and then, get a new IP. And most regions have tons of different blocks. Works like a charm.

>The OP's idea will stop working as soon as anyone blocks the archive.is

Hurr Durr. Why post nonsense for normies, in a thread populated by coders?

Thanks for confirming youre a shill, simply trying to to delegitimise a perfectly thought of plan by OP.

What addclick company do you work for user?

Bump.

Do you realize that Archive.is doesn't work off your own IP you stupid fucking shit

how does it feel to be technologically inept but suffer from delusions of competancy?

ARCHIVE.IS DOESN'T WORK OFF YOUR FUCKING HOME INTERNET CONNECTION

ARCHIVE.IS SCRAPES WEBSITES FROM ONE OF FOUR IPS OWNED BY THE GUY WHO RUNS ARCHIVE.IS

HOW FUCKING STUPID ARE YOU

you're trolling, that's the only explanation

Calm down sweetie.

*Sips tea*

Archive system doesn't need a static IP. you run the site from a different server. You really have no clue how to run a real archiving network do you? I have 30 servers dedicated to archiving. They only do archiving. And has been like that for years.

Then the other public functions run on a separate public network.

Fucking FBI shill hijacking chinkanons thread fuck off you stupid faggot half there replies in this thread are you. Turn off your computer and go outside

Yes he's shilling.

Would it be possible to decentralize this concept? I think it could be possible, but maybe too slow, by using an implementation of Web2Web github.com/elendirx/web2web or similar

now now dearie, no need to get booty bothered schnookums

also archive.is passes your IP to the server in the "X-Forwarded-For" header

That's fine if you're talking about setting up your own archiving system, but you were referring to using his concept of piggybacking off of archive.is. Maybe you did so by accident, go back and re-read what you said and what you were replying to.

Have you even built your own computer? Kill yourself non-STEMer, grownups are talking

post a disposable email and I'll send an invite.

I backed away from being able to parse javascript because it could introduce vulns and I'm focusing on stripping away information to create a basic, pre-2000 era readable webpage, not on parsing garbage that no one needs.

i already made an extension that does this. im too lazy/dumb to update it to be better though.

chrome.google.com/webstore/detail/merchant-starver/khkahmomgggciefjdllekdfeielfejjm

>someone doesn't pamper my bottom about my ideas
>they are a shill

Holy fuck. I thought that the dunning-kruger effect was a meme spouted by redditards but here you are, living proof that the dumber someone is, the smarter they think you are.

>also archive.is passes your IP to the server in the "X-Forwarded-For" header

What do you think that means? Do you think that means that the website that doesn't want to be archived by archive.is can't just block the four IPs used by archive.is?

Yeah, I wouldn't piggy back because it would be hell to organize and make it work better for you.

[email protected]

Heading out, email me anything for anyone that wants help or testing this type of thing.

>Would it be possible to decentralize this concept?

Already doing it and it'll be better than you can do.

obviously not too hard. you could have a list of domains that automatically get archived.

This guy and me were literally the only people with any skills or ability in this thread, and now I'm going back to work too

have fun fags

is it past babies bedtime, honeybunch?

my remark wasn't meant to disagree with you, but to add additional info about how it works

there there, want mummy to put some talcum on?

>>the only people who will use this are people who hate clickbait, marketing websites, ad bullshit, and are willing to use a special browser just to help starve the cancer out.
So, everyone.

>my remark wasn't meant to disagree with you, but to add additional info about how it works
>there there, want mummy to put some talcum on?


Low effort trolling, 0/10

You clearly were trying to obfuscate the issue by implying that your IP is not merely shown to the website being scraped, but used for the connection

I don't know why I can't stop feeding this troll tho, it's like those ducks in Vietnam where you keep tossing them bread and then eventually one of the alligators gets them

archive.is is fishy though.

Thank you for the encouraging words. Post your disposable email to get a Scrapist invite

And relying on other people's infrastructure is a dumpster fire of a plan

>tfw made low-quality chrome extension of OP's post and get zero (You)'s

fuck this place

not at all, homo

I was trying to suggest that any archiving system should take note that if it runs through archive.is, your actual IP will be sent to the target website when doing and archive. so a better system would not do it that way

You should have told people about it as a method to DIG DIG DIG on pizzagate and moloch websites and it would have hit the bump limit over and over

(DIG DIG DIG is a new way of saying look at the tiny shred of information that you can find by using search engines like CIA owned google who index a little bit of the internet and going through the publicly available results to find "CLUES")

I missed that

add a pic in future

Listen, you double nigger: You jumped in on this train of dumbfuck random neural collisions masquerading as 'thought':

The australian literally does not understand how Archive.is works. He thinks it goes through his home dreamtime petrol pipe or something.

You entered that line of discussion and defended his moronic position. Period.

You're probably posting from a fucking tablet your parents got you for your 15th birthday last month.

Don't forget it sounds like rapist. That is the most important part.

Good work, unlike the OP you actually can make things

>double nigger
>double!!?

zounds good sir, I are naught but 1 and 3/5ths nigger

you are a cove! and I would warrant a blackguard also, pish and tosh to you

>You will be able to use Scrapist to read advertising free, publicly user commentable, comfy reading template adjusted 'articles' by feminist bloggers who insist that they are literally being Raped by Scrapist

2017 confirmed comfiest year

youtube.com/watch?v=O1rT6Vi5Ln4

youtube.com/watch?v=vifYelSTlMo

>chrome
is there a version for FF?

Decentralize a database and update via p2p checking against a ledger for updates? Of course it's possible. That's what cryptocoin is.

shiftnrg.org

You have no taste for the classics.


youtube.com/watch?v=9ss_2mdMsiU

guys are thinking about this too hard.
we do need an archive site with a public API

regex identifies any FQDN
send request to archive to archive api
replace text on page with an archive link.

someone make a git repo.

I would use this. It stops sites from auto-refreshing and loads quicker.

youtube.com/watch?v=nsNJjGGR4P8

[email protected]

send your virus, CIA nigger

Stop datamining for your kike brothers you shill.

witnessed dobs

you just do some version of a distributed hash table.

duplicates will occur relatively infrequently, and can be reconciled relatively easily.

the difference between this and cryptocoin is if we intentionally do p2p, there is no equivilent of "blockchain server" there has to be an expectation of clients sometimes acting asynchronously. "nearline communication"

it is not 100% neccisary to eliminate duplicates, it would create a minor increased load to the archive site compared.

youtube.com/watch?v=MOhx0AI46Q0

gpl it
put it on git

Fucking shill. Gtfo.

waste.sourceforge.net/

en.wikipedia.org/wiki/WASTE

Fun fact: I've been experimenting with this protocol since 2003, there's a reason they tried to quash it so fast. Thank god they failed.

WASTE is the grandfather of all p2p crypto today. Bitcoin protocol is a modification to WASTE. And it's been out there, in the wild, free to play with, all along. They did a good job keeping it from being widely known but they were never able to get rid of it.

If you don't own a TPW P4-15NB then enjoy having all of your crypto hardware backdoored by the deep state.

>1 post by this ID

>kikeblocker
>kikeclicker
>fakenewsblocker
>shekelblocker
>shekeldodger
>FUmsm
>clickde-baiter
>baitclicker

This is a good thread

Such is the cost of trying to organize something on a site that is in perpetual chaos.

Sorry you went through frustrations regarding it, you people that can program have my respect. Achieving our cause will be greater than any hindrance we face along the way..

>putting it on git so everyone can look at my bad code

Okay. Not gonna put this on my github account though; last thing i want to be known for is creating an (((anti-semitic))) extension

github.com/merchantstarver/merchant-starver

happy viewing

>fake news blocker
thats actually what i was going to name mine kek
friend thought of "merchant starver" and i thought it was funnier

Finding a palatable name for a large chunk of the population - in particular the moderates and swing voters - is incredibly important. Using a name that literally includes the word "rapist" or something that can easily be smeared as an add-on for white supremacist nazis (hitl.er or something) will defeat the entire purpose of starving these sites of unique views if we make the app so toxic that no one wants to touch it or recommend it to others.

There are numerous examples of companies failing because of choosing a poor name, don't let this be one of them.

>muh pr

You guys were worthless during gamergate and you're worthless now

Looks nice, though it doesn't handle sub-domains, or does it? for links like: money.cnn.com/2016/10/30/media/facebook-fake-news-plague/index.html

>WASTE
this is probably the right idea. however it might be overkill. you dont need to physically share much data, really just data structure and some hashes.

can we establish some baseline requirements
- some form of pki crypto
- some nosql data store (something that can manage and search a hash table) with clustering
- archive site(s) with a public api (or simple CRUD / REST interface)
- something like uBlock to regex match any FQDN and replace it in page with the archive link
- peer orchestration framework

>github.com/merchantstarver/merchant-starver

is this really all a chrome addon is?
I've looked into firefox addons and my eyes crossed.