I'm trying to get a textbook, only copy online I found was on a chinese site with a download paywall (but free viewing).
What I did until now: I inspected the source and made script in pic to generate 500+ links to every page.
What I think I need to do: automate chrome to do this for me (load page, wait 5-10 secs to load the page, then use the "Print to PDF" functionality and number each file 1.pdf, 2.pdf etc. Only, I don't know how to do this (I don't know JS, and if I need it, what would I need to learn?)
That's what I think I need to do, any other ideas maybe?
If you get can it in plain text or HTML, why not use something like Pandoc to convert it to PDF?
Can't check the original site, it doesn't load for me.
Aaron Rogers
this is it's not plaintext it's something weird.... (flash)
also, the 2nd link should work..
tried runnin it, seems like I'll have to clean it up manually before running: Package inputenc Error: Unicode char 智 (U+667A) (inputenc) not set up for use with LaTeX.
Jason Campbell
Perhaps Flash to HTML5 to PDF? There should be some conversion tools around.
Colton Morgan
okay I cleaned it up and it kinda works, but only thing is that when you save the source html. google chrome doesn't save everything
the content is saved in "unnamed" (with no extension) files. And chrome saves only like 7 or 8.
How do I download everything, after waiting it all to load?
Liam Wilson
example:
Brandon Smith
I wonder if gnash could render these SWFs correctly, that would save a lot of time...
Isaac Jackson
Why do you want the textbook to be split in 500 different PDFs? Why not just one PDF with 500 pages?
I guess there's separate programs that can append them together later...
Adrian Barnes
in bash
for i in {1..500} do wget "docread.mbalib.com/read/768f505c1c5ed5f21c5bc0874b711328?num="$i"&code=35bd670302aea685074b3f8e29015d52&max=0" done
i think that should grab the whole page, it's behind that flash/embed, but i think it's just the pdf content. which you should be able to just merge with some pdf tool.
not on a linux box, so i can't test it though.
Levi Cook
that way you'll end up with 500 swfs which is pretty disgusting. we just need a way to render them to pdf
Brandon Scott
ayyyyy, gnash renders the files fine. for some reason the site blocks the wget user-agent, so you'll need to use wget -U "Mozilla" ""
Grayson Wood
that was the plan yeah, but I want one PDF ultimately.
not OP, but I'll finish writing a ripper for this mbalib website because it could be useful in future.
Isaiah Long
thank you!!
If you don't mind, how did you do the ripping & merging to pdf? (assuming you ripped and uploaded it)
Hudson Gutierrez
I didn't. There's a magical website called libgen (gen.lib.rus.ec) with all the books you'll ever need for college.
Isaac Allen
they're compressed swf files. "cwf" so you'll need to unzip them. fuck knows how to parse swf files though.
but yeah, op are you sure you can't get the book elsewhere?
Gavin Perry
easy, just use swfrender:
swfrender page.swf -X 1024 -o page.png
Jackson Myers
damn, I could swear I searched libgen and bookzz.org before doing this, apparently not enough
someone posted the link here, so it's done
That would be handy to have, if you don't mind could you send it (or github link or something) to [email protected]
David Brooks
OP here, I made a script (from info on here & on the internet) It's slow as fuck (any idea on how to do this in C++?)
#!/bin/bash
mkdir scripty cd scripty
for i in {1..553} do echo "Downloading file "$i"..." wget -U "Mozilla" \ "docread.mbalib.com/read/768f505c1c5ed5f21c5bc0874b711328?num=$i&code=35bd670302aea685074b3f8e29015d52&max=0" -O "$i.cwf" swfrender "$i.cwf" -X 1024 -o "$i.png" rm "$i.cwf" done convert $(find -maxdepth 1 -type f -name '*.png' | sort -n | paste -sd\ ) output.pdf
mv output.pdf .. ##moves output.pdf to parent directory
cd .. rm -r scripty
echo "done"
Isaac Ross
The fact that it's bash isn't what's slowing you down -- it's because you're grabbing 553 swf files and parsing them. I'll take a look in a minute, how slow are we talking? you could probably do this async or something.
Cooper Thompson
20 mins slow
also the order of the pdf file was fucked up
also it gave errors (probably info of the swf/cwf getting lost) is there a way to directly make a pdf of cwf without converting them to png?
Daniel Martinez
>20 mins slow yeah, this is to be expected if you're scraping 500 swf files. I'll try to improve on it a bit...
>also it gave errors (probably info of the swf/cwf getting lost) is there a way to directly make a pdf of cwf without converting them to png? not with swfrender, there might be alternative and better tools out there