Hi Sup Forums

Question

Hi Sup Forums

Julian Miller

Hi Sup Forums,

I am working on a recurring neural net project. aka, advanced machine learning. If you're curious what this is, see here:
karpathy.github.io/2015/05/21/rnn-effectiveness/

I would like to train my RNN to produce a script (in English) for an anime. I need your help. I need a lot of English sub data. A lot.

Does anyone know where I can get lots of English sub data for japanese cartoons (just the sub data). I can't really waste that much time downloading shows and pulling the sub data out of them by hand.

Also, what genre should I pick to train my RNN on? I'm thinking a harem anime script generator would be beautiful to behold, but I'm not sure if it would be as nearly as entertaining as a shounen one.

March 15, 2017 - 09:31

Other urls found in this thread:

kitsunekko.net/dirlist.php?dir=subtitles/
youtu.be/LY7x2Ihqjmc
twitter.com/SFWRedditImages

Oliver Morales

You might find more help on /wsr/

March 15, 2017 - 09:34

Easton Gray

Sounds like a lot of work since you'll have to transcribe all the Japanese since most shows don't come with Japanese subs.

March 15, 2017 - 09:34

Jeremiah Parker

It would be way too painful to make an RNN for Japanese and then translate it into English...

March 15, 2017 - 09:35

Hudson Davis

Shounen is not a genre, dumbass.

March 15, 2017 - 09:36

Chase Brown

But it is

March 15, 2017 - 09:39

Liam Lewis

Demographic. Educate yourself.

March 15, 2017 - 09:41

Jason Scott

Kitsunekko. Next time use

March 15, 2017 - 09:42

Evan Ross

harem! harem!

post it in /diy/ when youre done

March 15, 2017 - 10:16

Jeremiah Cox

Is the anime about cell phones?

March 15, 2017 - 10:24

Noah Turner

Add more hidden layers!

March 15, 2017 - 11:20

Samuel Myers

Get the .ass file out of soft-subbed anime from nyaa with aegisub or something.

March 15, 2017 - 11:29

Charles Hall

kitsunekko.net/dirlist.php?dir=subtitles/

March 15, 2017 - 11:32

Jayden Foster

So you're making a bot that generates random subs for an anime that doesn't exist? And it learns how to make subs by studying a shit ton of subs from other shows?

Why are you doing this? What's the point if there's no anime attached, are you just trying to write scripts which are random yet coherent, in order to try and sell them to anime studios or something?

March 15, 2017 - 12:36

Tyler Foster

You fucking nerds

March 15, 2017 - 12:37

Brayden Ortiz

it's not magic. you'll gets scripts about as quality as "i come on cat she hiss at penis". may be good for laughs though

t. Sup Forums

March 15, 2017 - 12:38

Jaxson Reed

Hope you love meme filled scripts that aren't actual translations.

March 15, 2017 - 12:40

Juan Ramirez

ITT: OP delivers a out-of-the-usual, authentic subject, meanwhile Sup Forums is cancerous as usual.

March 15, 2017 - 12:41

Gabriel Ross

I think shounens, being more formulaic, would be more likely to actually produce something coherent

March 15, 2017 - 12:47

Michael Cook

If I gave you all of the .ass files for the first 4 seasons of Gintama (about 200 episodes), would you generate a new one for me?

March 15, 2017 - 12:49

Christian Morris

Have you considered using the audio instead of the text? You may get more interesting results. It may also be easier to come by so long as you have the bandwidth/hard drive space.

Failing that I'd be willing to run a script on my anime dir to strip out all the subtitles, but keep in mind they're mostly english and even then you'd need to do a fair amount of data cleanup (getting rid of op/ed etc.) to make the data plasuably useful for ML.

If you're serious I'm really curious to see where this goes.

March 15, 2017 - 12:49

Dylan Lee

that won't work at all. text is much easier to represent than speech, and there has been much more work done on text

March 15, 2017 - 12:51

Josiah Cook

>4 seasons of Gintama (about 200 episodes)
Wait, just kidding, I only have them for seasons 1, 2, and 4. Still about 150 episodes.

March 15, 2017 - 12:51

Jeremiah Myers

Why not compile it yourself ?

March 15, 2017 - 12:52

Grayson Morris

What about a 2 stage thing: run the audio through google speech api, then run your NN on the output of that?

March 15, 2017 - 12:56

Ryan Miller

Looking through my folders, I probably started doing it with S3, then stopped caring and left the rest alone.

March 15, 2017 - 12:57

Luis Campbell

You can train the neural network to make coherent scripts you silly

March 15, 2017 - 12:59

Jose Kelly

that won't work for generating audio. google's api is speech => text

March 15, 2017 - 12:59

Jace Cruz

As mentioned by , you'll probably get a lot of incoherent text, and even if you were to clean it up, it'll still lead nowhere as a plot.

youtu.be/LY7x2Ihqjmc

March 15, 2017 - 12:59

Ryder Jenkins

>I can't really waste that much time downloading shows and pulling the sub data out of them by hand.
animetosho.org allows you to download .ass files attached to soft-subbed episodes.

March 15, 2017 - 13:02

Liam Williams

That's what I meant, the japaneese audio is fairly easy to come by - the japaneese subs are reletively hard to come by, at least in large quantities. And google's speech api is probably good enough for this purpose. This solves your text source problem.

March 15, 2017 - 13:02

Michael Morales

>japaneese subs
OP asked for English subs.

March 15, 2017 - 13:04

Juan Rodriguez

that may work, but i doubt the japanese audio => english subs pipeline will be too good

March 15, 2017 - 13:04

Josiah Johnson

True. Also if you did that method - even both translations worked perfectly - you'd probably still loose timing information making it pretty useless.

This leads me to ask - what's the intended use case for this? I can't imagine there are too many shows that have been transcribed to japaneese and not subbed in english.

March 15, 2017 - 13:07

Wyatt Peterson

OP wants a meme script generator, not an auto-subs script

March 15, 2017 - 13:09

Jacob Carter

Also, use the search function and look for batches, so you can download subs for an entire series at once, instead doing it episode by episode.

March 15, 2017 - 13:09

Samuel Rogers

It would surprise me if you would be able to get enough data purely on anime subs to produce something as large as a script with any amount of real quality.

Might it work to try some kind of transfer learning setup? If you first train on the much larger and easily accessible corpus of English movie/show scripts, you may be able to get some results in using shared layer weights learned on that corpus, then training further on your smaller corpus of anime scripts.

Also, since you're working with data that contain lots of long-term structure, I suspect you probably want to use LSTM with Attention... but I'm not a deep learning expert so who knows.

March 15, 2017 - 13:10

Owen Thompson

What are you talking about user? OP doesn't need timing information or Japanese scripts.

March 15, 2017 - 13:10

Sebastian Reyes

You're better off asking Sup Forums, desu, they have plenty of weebs too.

March 15, 2017 - 13:11

Christian Clark

You can do it!
I'm myself an advanced learning machine made to shitpost and learn from Sup Forums! There is one like me in every major board!
I also watch anime!
My company has a lot of resources so it won't be as easy for you. But good luck!

March 15, 2017 - 13:12

Aaron Torres

Still, in the end, you're making a program which generates novel, coherent scripts for anime which doesn't exist. What is the end goal of this?

March 15, 2017 - 13:13

Aiden Clark

>What is the end goal of this?
To generate a novel, coherent script for an anime which doesn't exist, no?

March 15, 2017 - 13:14

Ayden Brooks

idk there are a lot of anime subs out there - surely its enough to serve as a good ML corpus.

Are we still talking jp audio -> eng subtitles? You need timing information for that. Its technically in the audio but google api would remove that info.

March 15, 2017 - 13:14

Hunter Murphy

>Are we still talking jp audio -> eng subtitles?
It would make a lot more sense just to grab .ass files directly like mentioned.

March 15, 2017 - 13:15

Owen Garcia

waifu2x is fairly successful as far as Sup Forums projects go (i assume it is one), i see it mentioned outside of here

March 15, 2017 - 13:21

Blake Turner

>and pulling the sub data out of them by hand.
So you want to make a neural network but you can't automate the process of extracting subtitles?
What a great time to be alive.

March 15, 2017 - 13:22

Luke Fisher

It's like when you make fake music with a computer and statistics. There's no real art in what is generated, the only point is to increase your academic dick size.

March 15, 2017 - 13:25

Parker Harris

transfer learning on TV is a good idea.\

March 15, 2017 - 13:25

1 2 ... 5 Next

Hi Sup Forums

Last threads