Sup Forums full text search

I made something for you, Sup Forums. It's a simple script to search how often a piece text occurs on a Sup Forums board. More accurately, it's two scripts, one to download the all posts from a board and one to search them. To run the scripts you need jq 1.5 and curl, which are probably both in your distro's repos.

download.sh
#!/bin/sh
set -e
board="$1"

echo "Downloading catalog"
curl --fail --silent "a.4cdn.org/$board/catalog.json" > catalog.json

mkdir -p threads
jq .[].threads[].no catalog.json | while read thread
do
echo "Downloading $thread"
curl --fail --silent "a.4cdn.org/$board/thread/$thread.json" > "threads/$thread.json"
sleep 1 # Avoid ban
done

search.sh
#!/bin/sh
jq --arg search "$1" --raw-output --slurp '
map(.posts[].com) |
("total posts: " + (length | tostring),
"matching: " + (map(strings | match($search; "i")) | length | tostring))
' threads/*.json

example
./download.sh g && ./search.sh wife

Other urls found in this thread:

news.coffee/keywords
github.com/dflemstr/rq/
twitter.com/SFWRedditImages

If you want to see the actual matches, run this:
jq --arg search soy '.posts[] | select(.com // "" | match($search; "i"))' threads/*.json

Please excuse the typos. I fucked up when editing the OP post.

Are you aware of the catalog, friend?

Yes. You'll notice the script uses it.

interesting, i have something like this too at news.coffee/keywords

You should definitely extend it to Sup Forums.

If you don't actually parse the json, it's trash.

I do. It's what jq is for.

Then it's good.

Thanks, user.

Now show me how to do this on an operating system with more than 2% market share on the desktop.

I don't have a Mac.

He was talking about Windows, you cheeky fuck.

install msystoo

jq is awesome

Yeah.

What do you guys think of rq?

github.com/dflemstr/rq/

reposting this for the jq lovers
it needs to be edited to not use printf, but it downloads an entire thread from url.
4dl() {
board="$(printf -- '%s' "${1:?}" | cut -d '/' -f4)"
thread="$(printf -- '%s' "${1:?}" | cut -d '/' -f6)"
wget -qO- "a.4cdn.org/${board}/thread/${thread}.json" | jq -r '
.posts
| map(select(.tim != null))
| map((.tim | tostring) + .ext)
| map("

You can simplify the jq code to
.posts[]
| select(.tim != null)
| (.tim | tostring) + .ext
| "

Nice. this is just a repost of something I found online, but I'll update my version at home.
I also wrote versions for 8ch and lain based on it, I'll share if anyone's interested and the thread is alive when I get home

>I'll share if anyone's interested
Sure.

so only downloading a json file and sanitizing it?

Almost. It downloads several files and searches them for a regex.

fookadl() {
url=$1
hostname=${url%*/*/*/*}
bt=${url#$hostname/*}
thread=${bt#*/*/}
board=${bt%*/*/*}
json="${hostname}/_/api/chan/thread/?board=${board}&num=${thread}"
wget -qO- "${json}" | jq -r '
.[]
| .[]
| .media?
| .media_link?
'| grep -v null | xargs wget -U 'Mozilla/5.0' -nv
wget -qO- "${json}" | jq -r '
.[]
| .[]
| .[]
| .media?
| .media_link?
'| grep -v null | xargs wget -U 'Mozilla/5.0' -nv -nc
}