Fixing Jetpack’s Stats module

Despite the hate that Jetpack gets for being a bloatware plugin, it is one of my favorite and the first step whenever I setup a new WordPress install. However, Jetpack does have a few irritating habits that I cannot overlook. One of these is the stats module. The module actually does pretty well, posting data to the wordpress.com dashboard and making it easy for me to quickly glance at the number of visitors I’ve had for the day.

However, every so often the module craps out and logs a large number of visits from crawlers, bots and spiders as legitimate hits, since those are not in the official list of crawlers, bot and spiders to look out for. To fix this, I went out to look for the list and to add to it. One quick GitHub code search later, I found that the file class-jetpack-user-agent.php is responsible for hosting the list of non-humans to look out for. What I found inside was actually a pretty comprehensive list of software, but one that definitely needed extending.

If you want to do what I did, find the file in your WP installation at –
/wp-content/plugins/jetpack/class.jetpack-user-agent.php

Inside the file, look for the following array variable –
$bot_agents

You’ll see that the array already contains common bots like alexa, googlebot, baiduspider and so on. However, I deepdived (meaning did a sublime text search) into my access.log files and found some more. To extend the array, simply look for the last element (which should be yammybot) and extend it as follows –
'yammybot', 'ahrefsbot', 'pingdom.com_bot', 'kraken', 'yandexbot', 'twitterbot', 'tweetmemebot', 'openhosebot', 'queryseekerspider', 'linkdexbot', 'grokkit-crawler', 'livelapbot', 'germcrawler', 'domaintunocrawler', 'grapeshotcrawler', 'cloudflare-alwaysonline',

Note that you want to leave in the last comma, and you want all the entries in lower case. This doesn’t actually matter, because the PHP function that does the string compare is case-insensitive, but it just looks neater. You’ll also notice that I’ve added the precise names of the bots, like ‘grokkit-crawler’ and ‘clousflare-alwaysonline’ but you can be less specific and save yourself some pain. This will, however, affect your final stats outcome.

Notes –

  1. Some of the bots are pretty interesting. I saw tweetmemebot, which is from a company called datasift, which seems to be in the business of trawling all social networks for interesting links and providing meaningful insights into them. Another was twitterbot. Why the heck does twitter need to send out a bot? We submit our links to it willingly! Also interesting were livelapbot, germcrawler and kraken. I have no idea why they’re looking at my site.
  2. Although Jetpack does not have a comprehensive list of bots, it still does a pretty good job. I found the main culprit of the stats mess in my case. Turns out, CloudFlare, in an effort to provide their AlwaysOnline service (which is enabled for my site), looks at all our pages frequently and this doesn’t sit well with Jetpack. I hope this tweak will fix this now.
  3. Although this fix is currently in place, every time the Jetpack plugin gets updated, all these entries will disappear. That’s why this blog post is both a tutorial for you all and a reminder and diary entry for me to make this change every time I run a Jetpack update. However, if someone can tell me a way to permanently extend Jetpack, or if someone can reach out to the Jetpack team (hey Nitin, why don’t you file a GitHub issue against this?) it’ll be awesome and I’ll be super thankful!

Update – I was trying to be hip and did a fork of Jetpack and GitHub, made the changes and then tried to make a pull request. Turns out, I don’t know how to do that, so I opened an issue instead. It sits here.
 

Word of the day: rubric

According to TheFreeDictionary, rubric means a title, class or category. It’s also used when referring to a subheading or the full title of a file/post or page. Neiman Journalism Lab used it as follows –

The Brief, a tailored summary of business and international news under the rubric of “Your world right now.”

Source: Maybe the homepage is alive after all: Quartz is trying a new twist on the traditional website front door » Nieman Journalism Lab

Continue reading

How I Follow Blogs on the Open Internet

Colin Devroe’s post about Fred Wilson’s post about how hard it is to follow blogs on the Open Internet is interesting to me.

Ok, before we go any further, yes, this is very meta. Yes, I could have written this entire thing as comments on Colin’s blog (no, it doesn’t support comments) or Fred’s blog (has nice disqus comments) but I didn’t because that’s the point of blogging. I can write this ‘commentary’ on my blog. Sort of like Greek philosophers writing entire books just discussing each other’s books. Very meta indeed. Continue reading

You Won’t Finish This Article Either

Just today, I was having a discussion on ADN about how there’s too much noise on the Internet and if I had the choice of a broadcast medium, I’d go with newspapers. Some time after that, I noticed the link to an interesting article on Slate about how people are not reading entire articles on the Internet and are just skimming through, or even just reading the headline, and tweeting the link if they like the headline or an eye-catching photo.

At this point, it’s my duty to inform you that this is a post about Social media, sharing, reading on the Internet and is a bit of a rant, so if you’re not interested, you’ve already left the article. I’d also like to tell you that I wanted to name the article – “Dealing with loss, of Readers” but that seemed rather grim and I wanted to mimic the Slate headline, because it’s just that good. There’s another reason that I’ll tell you later about. Continue reading

Ghost: My comments

Ghost showed up on Kickstarter yesterday and like any good blogging platform, it’ll be judged, commented on, loved and hated. So let me start early. I don’t like it. I love the idea, I loved the beginning, I just don’t like the execution. Here are the two reasons why –

  1. NodeJS? Really?

NodeJS is all the rage right now. Every developer is discovering the strange and amazing things you can do with, of all the things, JavaScript and is running from pillar to post to launch a real-time, fast and easily scalable app as soon as possible. Of course, this means that there are some really nice apps out there. But is NodeJS ready?

Well, define ready.

Of course. Ready means that the next time some layman decides to set up a blog on the Internet, can (s)he purchase a simple hosting plan, upload a couple of NodeJS files and be up in 5 minutes? No. You have to rent a VPS or invest in Amazon AWS, upload files via git and then know how to develop locally and push out changes to the repo in the cloud(Notice all those keywords I threw there, developer?) In other words, you better be a developer and please don’t expect every Tom, Dick and Thorsten to be able to use this technology.

The ghost blog tries hard to defend its decision to go with JS based on the argument that it’s the future and is robust and allows innovation. It leaves out the fact that until the GoDaddies of the web hosting world don’t come out with NodeJS support in their basic plans, you’re not going anywhere with this blogging platform other than the few platforms that specifically support this technology. Oh, and your own computer.

  1. What about WordPress?

When Ghost was first introduced, O’Nolan talked about how WP changed his life and how it was awesome and awful at the same time and how his plan is to take the WP Core and rewrite parts of it to make it awesome-awesome. He meant it. He was going to fix WordPress with just a plugin. But then he didn’t. He’s going to keep the WP format, so that themes and plugins can be easily converted. He’s going to make tools to import from WP so that people can shift to Ghost ASAP. He’s going to take from WP and literally give nothing back. Ever.

I did not expect this. Well, the folks at WordPress probably did. They understand that WP is open source and people can easily add or take as they want. But I did not expect that instead of solidifying and giving better direction to WP, John would just steal from WP so blatantly and try to replace one good platform with another. He could have worked on the Core, he could have made it so much better as to force Automattic to consider his direction as the right path forward. He could have influenced the lives of so many WP lovers in such a positive way, but instead he chose to give up all that just because it would be a little more difficult to make the same stuff in PHP than it is in NodeJS. He gave up on the entire idea and instead focussed himself on getting people to drop WP and come to Ghost, leaving behind the entire essence of the platform that he’s clearly got a lot to thank for.

I’m a big proponent of WordPress. When friends come to me with even a semi-serious resolve to start a blog, I tell them of the cheap and easy hosting plans out there, how they can just upload a bunch of files and run an install script by opening a link in a browser and can search for and edit plugins and themes right from inside the web app and be running a blog in 5 flat minutes.

Now, when people will ask me about Ghost, the “better WordPress”, I’m just going to tell them that it’s not worth the effort and that it’s not ready for prime time. That’s because, NodeJS being such a nascent technology, we can’t expect to see large-scale adoption of the platform any time soon. We won’t see people being enabled to quickly setup a blog without too much hassle and we won’t see ghost being the de facto standard for someone just stepping into the world of blogging. You thought App.net was a country club? Wait till Ghost comes out.

 

This whole thing seems too much like a rant? As O’Nolan says, “Haters gonna hate.”

Experimenting with a new way of microblogging

Today, someone pointed out to me that my live blog – live.nitinkhanna.com wasn’t truly a micro blog because there was no way for people to reply to me. This got me thinking. Following the tenets of what a micro blog is from my recent post, I believe that a post, reply model, with no character limit on the post other than the author’s discretion with the ability to include multimedia in the post and the ability to host it on their own server really defines a micro blog.

Towards that, here’s an experiment – Disqus, the famous commenting system, has all of the above features. Though I do not, in the end, control the database of the posts, I can host a disqus plugin just about anywhere. This is where I choose to do it. This is now, a micro blog. Anyone can come and comment here. This allows  for Guest replies, mentions, multimedia attachments, moderation and links in the comments. There is even a mobile theme which will work if you visit this page from your smart phones.

This is just an experiment. I will post here only if people start posting here. My primary personal micro blog will still be on live.nitinkhanna.com and if anyone wants to reply there, you can do so on the Disqus comments at the bottom of that page.

Save yourself from the Ephemeral

As users of the Internet, we change a lot. We move email IDs, we jump from one social networking fad to another, we change bookmarking and read-it-later sites and even crash, delete or just forget blogs that we write on.

Most of the stuff I’ve done in the past 10 years or so on the Internet has been pretty personal. Emails, Orkut or Facebook where privacy settings allowed me to block external users or bookmarking sites that were private by default. But recently, most of my contribution to the Internet has been public – twitter and App.net, my blogs and even my bookmarking has been public. So is true for most of us out there. With the shift in social networks’ view of what data should be totally private, there’s a lot of data that’s in the public domain. This also means that there’s equally that much data that can be lost or can stagnate when an eventuality occurs – a web service shuts down because of acquisition or drying up of funds, your blog crashes and you have to start from scratch, you leave a social network and even though you download all your data and invite all your connections to the new one, some don’t join or you can’t upload any of that data anywhere else (how many social networks out there are interchangeable? None.) or maybe you just stop using a site or service and that data just sits there, alone and forgotten (just ask my bookmarks on del.icio.us). Continue reading

Another look at Distributed Social Networks

I’ve been reading a lot about App.net online and only a few voices are truly against the idea. Most of them seem to accept that a social network without ads would be a great idea. But some talk about not just privacy from ads but total ownership of your data. How is that possible? Simple, to own your data, you should own the platform. Which means what? It means that I should be able to download a software package, upload it to my own server and soon, anything I post on it would be owned just by me, giving me absolute control over who sees it and who doesn’t. App.net could just have easily been that PHP-MySQL based software, but there are a few problems it would have to face – Continue reading

Back to Instapaper

I’ve recently returned to Instapaper.

Why? Because it’s neat. I use Fever exclusively for my RSS consumption but the feed view in Fever is pretty bad. So, almost always, I found myself looking for a way to read interesting articles without visiting their ad-filled websites. Instapaper was embedded in Fever, but I discovered that if I don’t just want to save the article to Instapaper, I want to read it right then, I could easily integrate the “Instapaper Text” bookmarklet into Fever and go from there. Further, Instapaper’s code is smart enough to parse 99% of the sites I want to read cleanly (including, amusingly, Google.com :D)

Continue reading