Hi Sup Forums

Hi Sup Forums,

I am working on a recurring neural net project. aka, advanced machine learning. If you're curious what this is, see here:
karpathy.github.io/2015/05/21/rnn-effectiveness/

I would like to train my RNN to produce a script (in English) for an anime. I need your help. I need a lot of English sub data. A lot.

Does anyone know where I can get lots of English sub data for japanese cartoons (just the sub data). I can't really waste that much time downloading shows and pulling the sub data out of them by hand.

Also, what genre should I pick to train my RNN on? I'm thinking a harem anime script generator would be beautiful to behold, but I'm not sure if it would be as nearly as entertaining as a shounen one.

Other urls found in this thread:

kitsunekko.net/dirlist.php?dir=subtitles/
youtu.be/LY7x2Ihqjmc
twitter.com/SFWRedditImages

You might find more help on /wsr/

Sounds like a lot of work since you'll have to transcribe all the Japanese since most shows don't come with Japanese subs.

It would be way too painful to make an RNN for Japanese and then translate it into English...

Shounen is not a genre, dumbass.

But it is

Demographic. Educate yourself.

Kitsunekko. Next time use

harem! harem!

post it in /diy/ when youre done

Is the anime about cell phones?

Add more hidden layers!

Get the .ass file out of soft-subbed anime from nyaa with aegisub or something.

kitsunekko.net/dirlist.php?dir=subtitles/

So you're making a bot that generates random subs for an anime that doesn't exist? And it learns how to make subs by studying a shit ton of subs from other shows?

Why are you doing this? What's the point if there's no anime attached, are you just trying to write scripts which are random yet coherent, in order to try and sell them to anime studios or something?

You fucking nerds

it's not magic. you'll gets scripts about as quality as "i come on cat she hiss at penis". may be good for laughs though

t. Sup Forums

Hope you love meme filled scripts that aren't actual translations.

ITT: OP delivers a out-of-the-usual, authentic subject, meanwhile Sup Forums is cancerous as usual.

I think shounens, being more formulaic, would be more likely to actually produce something coherent

If I gave you all of the .ass files for the first 4 seasons of Gintama (about 200 episodes), would you generate a new one for me?

Have you considered using the audio instead of the text? You may get more interesting results. It may also be easier to come by so long as you have the bandwidth/hard drive space.

Failing that I'd be willing to run a script on my anime dir to strip out all the subtitles, but keep in mind they're mostly english and even then you'd need to do a fair amount of data cleanup (getting rid of op/ed etc.) to make the data plasuably useful for ML.

If you're serious I'm really curious to see where this goes.

that won't work at all. text is much easier to represent than speech, and there has been much more work done on text

>4 seasons of Gintama (about 200 episodes)
Wait, just kidding, I only have them for seasons 1, 2, and 4. Still about 150 episodes.

Why not compile it yourself ?

What about a 2 stage thing: run the audio through google speech api, then run your NN on the output of that?

Looking through my folders, I probably started doing it with S3, then stopped caring and left the rest alone.

You can train the neural network to make coherent scripts you silly

that won't work for generating audio. google's api is speech => text

As mentioned by , you'll probably get a lot of incoherent text, and even if you were to clean it up, it'll still lead nowhere as a plot.

youtu.be/LY7x2Ihqjmc

>I can't really waste that much time downloading shows and pulling the sub data out of them by hand.
animetosho.org allows you to download .ass files attached to soft-subbed episodes.

That's what I meant, the japaneese audio is fairly easy to come by - the japaneese subs are reletively hard to come by, at least in large quantities. And google's speech api is probably good enough for this purpose. This solves your text source problem.

>japaneese subs
OP asked for English subs.

that may work, but i doubt the japanese audio => english subs pipeline will be too good

True. Also if you did that method - even both translations worked perfectly - you'd probably still loose timing information making it pretty useless.

This leads me to ask - what's the intended use case for this? I can't imagine there are too many shows that have been transcribed to japaneese and not subbed in english.

OP wants a meme script generator, not an auto-subs script

Also, use the search function and look for batches, so you can download subs for an entire series at once, instead doing it episode by episode.

It would surprise me if you would be able to get enough data purely on anime subs to produce something as large as a script with any amount of real quality.

Might it work to try some kind of transfer learning setup? If you first train on the much larger and easily accessible corpus of English movie/show scripts, you may be able to get some results in using shared layer weights learned on that corpus, then training further on your smaller corpus of anime scripts.

Also, since you're working with data that contain lots of long-term structure, I suspect you probably want to use LSTM with Attention... but I'm not a deep learning expert so who knows.

What are you talking about user? OP doesn't need timing information or Japanese scripts.

You're better off asking Sup Forums, desu, they have plenty of weebs too.

You can do it!
I'm myself an advanced learning machine made to shitpost and learn from Sup Forums! There is one like me in every major board!
I also watch anime!
My company has a lot of resources so it won't be as easy for you. But good luck!

Still, in the end, you're making a program which generates novel, coherent scripts for anime which doesn't exist. What is the end goal of this?

>What is the end goal of this?
To generate a novel, coherent script for an anime which doesn't exist, no?

idk there are a lot of anime subs out there - surely its enough to serve as a good ML corpus.

Are we still talking jp audio -> eng subtitles? You need timing information for that. Its technically in the audio but google api would remove that info.

>Are we still talking jp audio -> eng subtitles?
It would make a lot more sense just to grab .ass files directly like mentioned.

waifu2x is fairly successful as far as Sup Forums projects go (i assume it is one), i see it mentioned outside of here

>and pulling the sub data out of them by hand.
So you want to make a neural network but you can't automate the process of extracting subtitles?
What a great time to be alive.

It's like when you make fake music with a computer and statistics. There's no real art in what is generated, the only point is to increase your academic dick size.

transfer learning on TV is a good idea.\