I would like to train my RNN to produce a script (in English) for an anime. I need your help. I need a lot of English sub data. A lot.
Does anyone know where I can get lots of English sub data for japanese cartoons (just the sub data). I can't really waste that much time downloading shows and pulling the sub data out of them by hand.
Also, what genre should I pick to train my RNN on? I'm thinking a harem anime script generator would be beautiful to behold, but I'm not sure if it would be as nearly as entertaining as a shounen one.
So you're making a bot that generates random subs for an anime that doesn't exist? And it learns how to make subs by studying a shit ton of subs from other shows?
Why are you doing this? What's the point if there's no anime attached, are you just trying to write scripts which are random yet coherent, in order to try and sell them to anime studios or something?
Tyler Foster
You fucking nerds
Brayden Ortiz
it's not magic. you'll gets scripts about as quality as "i come on cat she hiss at penis". may be good for laughs though
t. Sup Forums
Jaxson Reed
Hope you love meme filled scripts that aren't actual translations.
Juan Ramirez
ITT: OP delivers a out-of-the-usual, authentic subject, meanwhile Sup Forums is cancerous as usual.
Gabriel Ross
I think shounens, being more formulaic, would be more likely to actually produce something coherent
Michael Cook
If I gave you all of the .ass files for the first 4 seasons of Gintama (about 200 episodes), would you generate a new one for me?
Christian Morris
Have you considered using the audio instead of the text? You may get more interesting results. It may also be easier to come by so long as you have the bandwidth/hard drive space.
Failing that I'd be willing to run a script on my anime dir to strip out all the subtitles, but keep in mind they're mostly english and even then you'd need to do a fair amount of data cleanup (getting rid of op/ed etc.) to make the data plasuably useful for ML.
If you're serious I'm really curious to see where this goes.
Dylan Lee
that won't work at all. text is much easier to represent than speech, and there has been much more work done on text
Josiah Cook
>4 seasons of Gintama (about 200 episodes) Wait, just kidding, I only have them for seasons 1, 2, and 4. Still about 150 episodes.
Jeremiah Myers
Why not compile it yourself ?
Grayson Morris
What about a 2 stage thing: run the audio through google speech api, then run your NN on the output of that?
Ryan Miller
Looking through my folders, I probably started doing it with S3, then stopped caring and left the rest alone.
Luis Campbell
You can train the neural network to make coherent scripts you silly
Jose Kelly
that won't work for generating audio. google's api is speech => text
Jace Cruz
As mentioned by , you'll probably get a lot of incoherent text, and even if you were to clean it up, it'll still lead nowhere as a plot.
>I can't really waste that much time downloading shows and pulling the sub data out of them by hand. animetosho.org allows you to download .ass files attached to soft-subbed episodes.
Liam Williams
That's what I meant, the japaneese audio is fairly easy to come by - the japaneese subs are reletively hard to come by, at least in large quantities. And google's speech api is probably good enough for this purpose. This solves your text source problem.
Michael Morales
>japaneese subs OP asked for English subs.
Juan Rodriguez
that may work, but i doubt the japanese audio => english subs pipeline will be too good
Josiah Johnson
True. Also if you did that method - even both translations worked perfectly - you'd probably still loose timing information making it pretty useless.
This leads me to ask - what's the intended use case for this? I can't imagine there are too many shows that have been transcribed to japaneese and not subbed in english.
Wyatt Peterson
OP wants a meme script generator, not an auto-subs script
Jacob Carter
Also, use the search function and look for batches, so you can download subs for an entire series at once, instead doing it episode by episode.
Samuel Rogers
It would surprise me if you would be able to get enough data purely on anime subs to produce something as large as a script with any amount of real quality.
Might it work to try some kind of transfer learning setup? If you first train on the much larger and easily accessible corpus of English movie/show scripts, you may be able to get some results in using shared layer weights learned on that corpus, then training further on your smaller corpus of anime scripts.
Also, since you're working with data that contain lots of long-term structure, I suspect you probably want to use LSTM with Attention... but I'm not a deep learning expert so who knows.
Owen Thompson
What are you talking about user? OP doesn't need timing information or Japanese scripts.
Sebastian Reyes
You're better off asking Sup Forums, desu, they have plenty of weebs too.
Christian Clark
You can do it! I'm myself an advanced learning machine made to shitpost and learn from Sup Forums! There is one like me in every major board! I also watch anime! My company has a lot of resources so it won't be as easy for you. But good luck!
Aaron Torres
Still, in the end, you're making a program which generates novel, coherent scripts for anime which doesn't exist. What is the end goal of this?
Aiden Clark
>What is the end goal of this? To generate a novel, coherent script for an anime which doesn't exist, no?
Ayden Brooks
idk there are a lot of anime subs out there - surely its enough to serve as a good ML corpus.
Are we still talking jp audio -> eng subtitles? You need timing information for that. Its technically in the audio but google api would remove that info.
Hunter Murphy
>Are we still talking jp audio -> eng subtitles? It would make a lot more sense just to grab .ass files directly like mentioned.
Owen Garcia
waifu2x is fairly successful as far as Sup Forums projects go (i assume it is one), i see it mentioned outside of here
Blake Turner
>and pulling the sub data out of them by hand. So you want to make a neural network but you can't automate the process of extracting subtitles? What a great time to be alive.
Luke Fisher
It's like when you make fake music with a computer and statistics. There's no real art in what is generated, the only point is to increase your academic dick size.