How to make GIFs of sites using WayBackMachine

So… I like following fivethirtyeight’s interesting 2016 Election Prediction page. It shows the ups and downs and the general mood of the election. I’ve been staring at it for so long that I wanted to collect the daily changes and make a nice GIF. I know the Internet Archive’s WayBack Machine collects archives of popular websites, so I went there and found that the Election Prediction page is on there too.

So, I started looking for ways to make a GIF from the WayBack machine. There were some node and ruby scripts and applications which didn’t really work. But then I landed on waybacklapse. Its developer – Kyle Purdon – works for bitly and has built two versions of waybacklapse. The older one is python, node, imagemagick and then some. The newer one is python3 and docker. Eww. I followed the steps of the tutorial for the older version, with a few notable exceptions –

  1. The tutorial is for OS X and is a little dated. What I have on hand is an Ubuntu 15.04 VM, so I went ahead and used apt-get install instead of brew
  2. The tut tells you to use the command “git checkout -t v1.1.0”, but it should be “git checkout -b v1.1.0”. Technically v1.1.0 is a tag, not a branch, but I didn’t know that and just used -b, which worked, so why mess with a good thing, amiright?
  3. You need to have node installed, but not the new node. Install old node with “apt-get install nodejs-legacy” and use the command “nodejs app.js” when you’re running screenshot-as-a-service
  4. The tut doesn’t mention that you need to actually *run* screenshot-as-a-service. I went to the github page for the service and found out that I need to run the above “nodejs app.js” command in order to run a server on the localhost. Technically, waybacklapse has code in it to warn you that the server isn’t running. But that didn’t work so well for me.
  5. The user prompts for waybacklapse only allow for monthly or yearly snapshots. But fivethirtyeight has only been running the site for about 3 months, with daily updates, so those didn’t make sense to me. I wanted to get all the changes. So, after installing waybacklapse with pip, I went ahead and modified the code inside /usr/local/lib/python2.7/dist-packages/waybacklapse/waybacklapse.py with one small change to get all the screenshots instead of just monthly or yearly ones –
    1. In the create_payload function, I commented out the collapse variable as follows –

All was well and good, but not really. Turns out, screenshot-as-a-service pulls a screenshot of the entire page, not just above the fold. Which is great, and not so much. I was looking at a GIF that was way too long to be palatable. So, I needed a way to extracts parts of the screenshots so I could make a nice, clean and small-ish GIF. Luckily, waybacklapse made me install imagemagick. So I looked around and made the following script.

It must sit inside the screenshot folder. It parses through the screenshots and converts them into smaller versions of themselves. Finally, I found the command inside waybacklapse which creates the GIF. I modified it a bit and used it to recreate the GIF.

convert -delay 30 /root/fivethirtyeight/2016081011081470853418/final-*.png /root/fivethirtyeight/2016081011081470853418/timelapse/2016electionforecastss.gif

Now, I could go about changing waybacklapse and submitting the code to the author, but he’s moved on to docker and in-house solutions for the dependencies, so I doubt it’ll be a benefit to anyone. Instead, I’ll just leave these notes here so I can reference them in the future. If they helped you, shout out in the comments section. Oh, and I’ll leave you with the GIF I made. –

FiveThirtyEight's Election Forecast in a GIF

Here’s some love for LinkedIn Users

Just tap that button

Some time ago, my brother came to me with a problem. He loves LinkedIn. It’s a great service. But as much as he loves connecting with people on that professional network, there are some glaring inefficiencies that he does not appreciate. He wasn’t interested in removing ads or making it look nicer. He just wanted to see the information that people intend on displaying on the site. You see, there’s a plethora of information available on LinkedIn, but it’s mostly hidden.

For some reason, if you’re landing on a user’s profile from LinkedIn’s user search, or from a Google search, you end up seeing this –

But what you should really be seeing is, at least, the user’s name, a little bit about their history and experience. Essentially, you should be seeing something like this –

LinkedIn’s been around since some time now, but they haven’t fixed this weird issue and so, your LinkedIn experience is often curtailed by what can only be called a minor bug.

Not any more. Today, NiKhCo. has launched a new tool, “LinkedIn Reveal”, which will solve this absurdest of LinkedIn woes. It enables you to explore LinkedIn with the depth you never thought possible. We’re not trying to build something that changes the way LinkedIn displays information or makes things look fancy. We’re just building something that lets you see LinkedIn as it truly should be – a beautiful, open, professional network with all the information you need about people, companies, jobs and connections.

LinkedIn Reveal is now available in the Google Chrome Web Store. Do check it out. It’s valuable for everyone who uses LinkedIn. Also, here’s a screenshot, because pictures somethingsomething thousand words somethingsomething. :)

Fixing Jetpack’s Stats module

Despite the hate that Jetpack gets for being a bloatware plugin, it is one of my favorite and the first step whenever I setup a new WordPress install. However, Jetpack does have a few irritating habits that I cannot overlook. One of these is the stats module. The module actually does pretty well, posting data to the wordpress.com dashboard and making it easy for me to quickly glance at the number of visitors I’ve had for the day.

However, every so often the module craps out and logs a large number of visits from crawlers, bots and spiders as legitimate hits, since those are not in the official list of crawlers, bot and spiders to look out for. To fix this, I went out to look for the list and to add to it. One quick GitHub code search later, I found that the file class-jetpack-user-agent.php is responsible for hosting the list of non-humans to look out for. What I found inside was actually a pretty comprehensive list of software, but one that definitely needed extending.

If you want to do what I did, find the file in your WP installation at –
/wp-content/plugins/jetpack/class.jetpack-user-agent.php

Inside the file, look for the following array variable –
$bot_agents

You’ll see that the array already contains common bots like alexa, googlebot, baiduspider and so on. However, I deepdived (meaning did a sublime text search) into my access.log files and found some more. To extend the array, simply look for the last element (which should be yammybot) and extend it as follows –
'yammybot', 'ahrefsbot', 'pingdom.com_bot', 'kraken', 'yandexbot', 'twitterbot', 'tweetmemebot', 'openhosebot', 'queryseekerspider', 'linkdexbot', 'grokkit-crawler', 'livelapbot', 'germcrawler', 'domaintunocrawler', 'grapeshotcrawler', 'cloudflare-alwaysonline',

Note that you want to leave in the last comma, and you want all the entries in lower case. This doesn’t actually matter, because the PHP function that does the string compare is case-insensitive, but it just looks neater. You’ll also notice that I’ve added the precise names of the bots, like ‘grokkit-crawler’ and ‘clousflare-alwaysonline’ but you can be less specific and save yourself some pain. This will, however, affect your final stats outcome.

Notes –

  1. Some of the bots are pretty interesting. I saw tweetmemebot, which is from a company called datasift, which seems to be in the business of trawling all social networks for interesting links and providing meaningful insights into them. Another was twitterbot. Why the heck does twitter need to send out a bot? We submit our links to it willingly! Also interesting were livelapbot, germcrawler and kraken. I have no idea why they’re looking at my site.
  2. Although Jetpack does not have a comprehensive list of bots, it still does a pretty good job. I found the main culprit of the stats mess in my case. Turns out, CloudFlare, in an effort to provide their AlwaysOnline service (which is enabled for my site), looks at all our pages frequently and this doesn’t sit well with Jetpack. I hope this tweak will fix this now.
  3. Although this fix is currently in place, every time the Jetpack plugin gets updated, all these entries will disappear. That’s why this blog post is both a tutorial for you all and a reminder and diary entry for me to make this change every time I run a Jetpack update. However, if someone can tell me a way to permanently extend Jetpack, or if someone can reach out to the Jetpack team (hey Nitin, why don’t you file a GitHub issue against this?) it’ll be awesome and I’ll be super thankful!

Update – I was trying to be hip and did a fork of Jetpack and GitHub, made the changes and then tried to make a pull request. Turns out, I don’t know how to do that, so I opened an issue instead. It sits here.
 

Deleting Duplicate items in Fever RSS

My Fever RSS setup has a lot of feeds that often duplicate items. There are feeds from news sites such a The Times of India corresponding to National, International and Governmental news as well as feeds from tech sites that often repeat things. The end result is that I often see the same title, the same post and thus repetitive news many times during the day. I found the following script to be an excellent way to remove duplicate items from Fever. This works on the MySQL level and so you should be careful when using it, lest you delete everything because of some coding error on my part (though I’ve checked and this works). Continue reading

Notes for Week 2 of 2014

So, it’s been an interesting week. Some observations –

Social

Found this gem of a Difference between Facebook and Twitter –

Facebook – 

“Best Practices

Making API calls directly to Facebook can improve the performance of your app, rather than proxying them through your own server.”

Twitter – 

“Caching

Store API responses in your application or on your site if you expect a lot of use. For example, don’t try to call the Twitter API on every page load of your website landing page. Instead, call the API infrequently and load the response into a local cache. When users hit your website load the cached version of the results.”

< p>Turns out, when not losing market share to a third-party app, Facebook is actually quite nice to developers as compared to Twitter. To be fair, tweets constitute a lot more volume and processing, so it would make sense for Twitter to want the devs to cache their data. Also, even ADN  has rate limits but at least their limits are more generous than Twitter.

Seriously though, twitter has millions of dollars for servers and all I have is a 128MB VPS. What the heck, Twitter?

Google(+)

Google is no longer Google. It’s Google(+). Everything we love about Google and it’s services is being slowly replaced by Google+ and the latest victim is GMail. Now anyone on Google+ can email you without knowing your email ID. As a communication tool, this makes GMail more open. But that’s exactly what people don’t use GMail for. They use it for Email. Big difference there Google. You can opt-out, but what’s the bet that option will be going away soon?

What Google should actually do –

Google understands one thing and one thing alone – Search. Pushing Google+ isn’t going to help them overcome the social networks of the world. But there is one thing I covet – the Search API. Seriously, why don’t we see third-party Search apps that innovate the way we see our Search results. That’s one data stream we’ve not targeted yet. Google needs to let people in, do their thing and pretty soon we’ll see people integrating Search with  social platforms. Oh, you wanna see which of your Facebook friends searched for the latest Tom Hanks movie and then clicked on IMDB? Here’s the data to that. Seriously Google, stop letting one segment of the business take over the other, specially since we know you’ll kill Google+ a couple of years from now.

Advertising

Ah, advertising! The Bane of TV show lovers binge-watchers. Advertising has slowly crept in everywhere on the Internet, from YouTube to Hulu. Towards YouTube, go find YouTube5. It’s an extension that replaces the usual YouTube player with a cool HTML5 one and kills all ads in the process. Enjoy.

To Hulu, I say, well, get rid of the “Brandon Switched to Ford” ad. Seriously. It’s a stupid ad, I’ve seen all too much of it and Brandon looks like a total douche for being the black sheep who abandoned the family tradition and switched from a Honda to a Ford. If ever Hulu fails, it’ll be because they keep repeating the same ads over and over again. I do not want to be bored by ads, I want them to be innovative and interesting. (Coincidentally, Samuel L Jackson staring in my face is not innovative. I’m looking at you, Capital One.)

I finally also saw the KFC ads that look like some woman with a video camera uploaded to YouTube. That’s supposed to be innovative? Nope. She looks drunk/high/both and you’re not fooling anyone with these ads KFC, those are scripted (or worse, they’re not!).

Finally, saw a teeth whitening strips ad on Hulu that said, very specifically, “If your teeth are not getting white, they’re getting yellow”. Ok, first of all, yellow teeth are perfectly normal and more an indication of stomach trouble than a medical emergency. Second, the ad targets people women who drink coffee. First it was guys who smoke who were targeted and now this. Finally, that text up there. That’s a scare tactic. Pretty soon, they’ll come up with a white paper saying that yes, your teeth getting yellow is a medical problem and you need to use teeth whitening strips in conjunction with toothpaste. All of this will be driven by only one thing – Sales telling the Marketing team to get innovative with the ads. There’s no real medical issue that they’ve tried to resolve.

That concludes the rant session on advertising.

Clients from Heaven

I’ve been building a web app for my brother and he mentioned that the text on the screen doesn’t ‘look black’. For a second, I tried hard not to wonder if my brother is a typical MBA Client from Hell but as it turns out, he was right, the text was actually #2C3E50 which is actually a weird dark blue. Thanks Bootstrap for making me look bad in front of my brother!

WordPress

It was an exciting week to be a WordPress user. Snaplive, a front-end text editing solution was showcased to a few who had signed up for updates. It seems to work really well with WordPress, so expecting some really good things in the future.

Ghost had promised to revolutionize WordPress, but instead it went and setup shop elsewhere. That’s ok, since we have Gust, which is a plugin that ports the awesome Ghost Admin panel functionality to WordPress. Mind you, this just released, so if you’re not ready for bugs (which software doesn’t have bugs?), don’t install this yet.

Finally, a shout out to whatweekisit.com, which I used to, umm, calculate which week of 2014 we’re in. Yeah, I should have just looked at a calendar.

Auto-refresh for Fever on AppFog

Today, I got asked something about my “Installing Fever on AppFog” tutorial. Fever has an inbuilt module to refresh your RSS feeds periodically but this module doesn’t work on all types of servers and it certainly doesn’t work on AppFog. Shaun, being the good guy that he is, lists out a way to set up a curl command with a cron job to refresh the feeds automatically. Unfortunately, AppFog doesn’t support crontab directly either. So, I got asked if there’s a solution for this. After a little bit of Googling and finding this solution on stackoverflow, I built up a working solution specific to Fever on Appfog. The detail follows – Continue reading

Pythonista + Fever + Instapaper = Quick RSS Magic

I Love Python. It’s a simple, easy and quick to learn language. Before learning Python, the major language I knew was Java and believe me, that’s a pain! Seeing Python grow from a simple scripting language to a major platform is also a great feeling. The recent awesomeness about Python I discovered was Pythonista for iOS. It’s a wonderful app that allows you to run python scripts of varying complexity on your iPhone or iPad without worrying about silly things like Objective C. Of course, it’s not the perfect app, there are limitations to the libraries and you can’t easily transfer scripts to the app from your desktop. But hey, as long as it’s Python, right? Continue reading