Identifying posters by their writing style

idk what's the most appropriate board where to ask this but maybe some of you know something

do you think that it should be possible from a program to tell apart different posters, given that they write a long enough message, basing on the characteristics of their text?

for example words,
using synonyms less usual than others,
the way, among all the possible legal ones, to construct a discourse
capitalizing words or not
basically to create a set of poster "models" and to give the estimate probability that x post belongs to y poster

for example, if i write another long post in the thread, the analyzing program should say that there ie 89% of chance that the post belongs to me

this could be especially useful in generals, especially generals with shitposters and annoying users

sorry if i gave a messy explanation but i hope you understood what i mean

Other urls found in this thread:

mobile.nytimes.com/blogs/bits/2012/01/03/software-helps-identify-anonymous-writers-or-helps-them-stay-that-way/?referer=
psal.cs.drexel.edu/index.php/Main_Page
medium.com/@amuse/how-the-nsa-caught-satoshi-nakamoto-868affcef595
en.wikipedia.org/wiki/Stylometry
twitter.com/SFWRedditImages

sure it's possible
but not very useful

It is certainly possible to a degree, you can somewhat form people into categories so long as the person isn't being aware of how they type up their posts.

Like you, for example, would go into my fucking retard category.

on this board at least, there are about 10 users that i recognize in almost every thread, just based on their style of writing. it really differs between people, and that becomes very apparent when you spend 16 hours a day reading the shit they post. so im pretty sure that itd be possible for a program too.

DELET

It is indeed possible to identify a poster by the way they write their posts, capitalization, oxford commas, et cetera. but you're kinda fucked if i suddenly talk like a faggot

Yes which is why I change my mannerisms between posts and always use random timestamps for my filenames

>Like you, for example, would go into my fucking retard category.
may i ask you why?

>but not very useful
i would be the next level of filtering shitposters

it*

Do you recognize me?

...

>do you think that it should be possible from a program to tell apart different posters
On slow boards with few posters, such as Sup Forums, you don't even need a program to do this.

OK. For example, you look like a redditor.

First, your image.
Second, your sentence structure is worse than teen ESLs I know.
Third, your lack of capitalization and your inability to punctuate in a reasonable fashion.

reddit kys

i could make a program for identifying based pussy posters

There are already algorithms for that. That's also why I will never be able to publish my erotic fanfiction ;_;

First, i took a random picture since it's not really important and there is nothing really representative of it
Second, I wrote hastily, and I wonder how many foreign languages you speak. I wouldn't be surprised if you happen to be another arrogant anglo who is too retard to speak anything else
Third, this is fucking Sup Forums, learn2speechregister before tipping your fedora, mr. supreme gentleman

pretty much impossible since the maximum possible degree of variance in sentence structure is quite limited

>suddenly
>if

what about other languages that offer an higher degree of variation? like romance languages

I picked*

i dont know since i dont speak any. i cant imagine it would be THAT much different though.

It is possible. It is really only useful with xbox huge datasets. Google/advertising companies and alphabet agencies use it to link profiles across different services.

Sup Forums shitposts are not complex enough for things more complicated than a wordfilter to be useful. You need a minimum amount of complexity to get a high confidence result.

Sup Forums has hive mind mentality. all users write very similarly using memes and """culture""". so this is probably not possible.

They can already do this, OP.

mobile.nytimes.com/blogs/bits/2012/01/03/software-helps-identify-anonymous-writers-or-helps-them-stay-that-way/?referer=

Where did the 'reddit spacing' meme come from?
>inb4 reddit

but i guess that annoying users in certain niche generals could be identified, since they don't just meme but they usually write long and toxic messages
you could also make the program analyze the archives

Definitely possible. The secondary question is can we obtain training data with existing archives. Some boards used to have tagging systems. Does anyone have links 2 dumps from those times?

do you know if there is any similar FOSS publicly available?

You could, but I have only ever seen the process applied to huge datasets then outputting possible linked profiles with confidence intervals. Not sure how it would work as a filter.

You'll need an trained or an expert system. But God dammit it's possible. I'm willing to collaborate to do this shit

;^)

Of course, NSA and FBI have used algorithms in the past to detect typing patterns of criminals for example the FBI found an infamous pedophile online from how he greets with "hiya" in chat rooms. Once they have a lead they can document and use an algorithm to detect patterns between known typing and the potential suspects typing and it's shockingly as good as a fingerprint granted you get a good sample. IBM, Intel and a few other tech giants already have technology implemented.


P.S
I can already tell geographic region of everyone in this thread

When you have several paragraphs and put spaces between them, that's the leddit style.

>I can already tell geographic region of everyone in this thread
You are either a rusky, a pole, a pajeet or a bot.

Stylometry

Especially when it's actually sentences spaced out.

Like this and they do it thinking it looks better and easier to read.

When it's really just very retarded and inefficient use of page space.

And they will do it on every forum.

I think it'd lose effectiveness because most Sup Forumsentlemen are smart enough to mix up their patterns when they're same-fagging, at least I am.

for instance, this guy is me too.

It is definitely possible.
The question is if we want that, and I'd say the answer is "No."

I couldn't care less about someone identifying the posts I have made on Sup Forums, but I don't want to have the knowledge regarding other posters forced on me.
I want to judge every post on its own merits, not based on the history of its creator.

I doubt it's possible in a random thread, but I see it frequently on Sup Forums because it also has flags so you can make an association between flag and writing style and you come up with some unique characters, for example:
Malaysian mike, Greek tranny poster, the argie that spams about the septuagint in Christian threads, etc

psal.cs.drexel.edu/index.php/Main_Page

that's just the open sores javashit academic version
It's possible and it's adopted by internet cops worldwide. Be aware of it next time you post in that pedo forum, Mr. J. Gustavson.

Depends.

As long as they write enough words for each post or they say enough unique strings / sequence of strings and you have enough posts, ya I'd say you could absolutely do that.

I think you'll find this interesting.

medium.com/@amuse/how-the-nsa-caught-satoshi-nakamoto-868affcef595

en.wikipedia.org/wiki/Stylometry

why the ugly whore?