Tip to anyone ever planning to work with parsing website output:

Take an API if you can.

Following up from this, I'm doing this because I just had to fix a raw HTML parsing thing that for some reason kept failing.

The reason? The site in question had added another table row for formatting, thereby silently breaking the output.

That said, most sites nowadays that aren't fucking coded in PHP (which sadly is still a bigger part of the internet than I want there to be) provide reasonably sane APIs.


Rambling about sites that are NSFW Show more

To close this one off, I know documentation is hard (in fact, I hate documentation myself, go figure!)

That said, even if you just give me the list of every parameter that's valid at an API endpoint, that already saves me soooo much time. I can test the output with something like http-prompt in such a situation and get the rest from it myself, but at the very least provide this basic thing.

Final reply to this chain (I SWEAR I DIDN'T INTENT FOR IT TO BE THIS LONG!)

Also, if your site is having overload issues, don't be afraid to make your API toss results in a queue (looking at you archive.is!)

What is not acceptable is simply timing out the connection because the site is overloaded.

Sign in to participate in the conversation

It's like Twitter but you can like host it yourself n crap. This instance will probably contain high dosages of weeb. TO GET APPROVED, READ THE ABOUT PAGE!