Fun with micropayments and microcharges

Jeffrey Friedl accepts donations as low as $0.01 for his Lightroom plugins. He says for donations that low, Paypal takes all of it as processing fee and he gets nothing, but he’s cool with that. I found this most intriguing and had to try it. PayPal lets me send money in US dollars or rupees, but my rupee transactions started failing after Verified by Visa and Mastercard SecureCode were made mandatory in August, so I paid in dollars. $0.01. It went through. Nifty! (To be fair, I was registering three plugins and didn’t have the heart to do this all three times.)

Minutes later, HDFC Bank sent me an SMS receipt for the payment. They charged me Re 1. One rupee is roughly two cents, but PayPal had asked for and received only one cent.

Question: Where did the other cent go? I know it’s part of the processing costs and is too small for anyone to be bothered about, but that’s not the point. How is this missing cent accounted for? PayPal didn’t receive it, and I sent it, so it’s somewhere on HDFC Bank’s books. In a physical cash transaction, it’s normal to round off to the nearest rupee before it goes on the books, but this transaction was handled by software that rounds off to Rs 0.01, the minimum unit of currency your bank recognises. Where’s my 50 paise?

Here’s the best part: HDFC Bank tacks three additional charges on every foreign currency transaction: a currency conversion charge, service tax on the currency conversion, and education cess on the service tax (tip of the hat to our world-famous Indian bureaucracy). What’s your guess on what these charges are going to be for the $0.01 transaction? Remember, they’ve already charged me 100% over the actual payment.

Open source as infrastructure

On the last day of FOSS.in 2009, some of us gathered in the speakers’ hotel to hang on to that sense of wonder for just a bit longer. Ramkumar Ramachandra and I ended up discussing open source philosophy late into the night. Ram’s consolidated his thoughts from that evening into a pair of posts on open source as infrastructure and community and business interaction. Both were posted earlier this month, but I somehow missed them.

My own understanding of the infrastructure angle to open source comes from Doc Searls’s writing around 2001. Doc has a more recent write-up on understanding infrastructure (Apr 2008) that’s well worth reading.

Budget for Dharamsala trip

Deepak asked if he could see some numbers for my sabbatical year. I haven’t sorted out my accounts yet, but for a taste, here’s the budget and actual costs for my trip to Dharamsala in August. I had more time than money at this point, so I went slow and watched every rupee.

Columns with missing values indicate heads I hadn’t budgeted for.

Item Date Per Unit Count Est. Cost Actuals
Total Rs 6697 Rs 4609
Transport to railway station 19/08 120
Train: Bangalore to Delhi 19/08 729 1 729 729
Food on train 19/08 50 4 200 200
Food in Delhi 21/08 221
Train: Delhi to Pathankot 21/08 222 1 222 222
Bus: Pathankot to Dharamsala 22/08 350 1 350 166
Accommodation in Dharamsala 22/08 350 5 1750 450
Food in Dharamsala 22/08 50 15 750 665
Bus: Dharamsala to Delhi 26/08 700 1 700 450
Food in Delhi 27/08 100 2 200 475
Train: Delhi to Bangalore 28/08 1546 1 1546 729
Food on train 28/08 50 5 250 182
Transport from railway station 29/08 20

We were a group of four. The others started from Delhi and had already booked train tickets to Pathankot, so I tagged along, even though a direct bus between Delhi and Dharamsala made more sense. I took a sleeper class train coach from Bangalore to Delhi. I wasn’t sure it would be comfortable—36 hours in a metal box without air-conditioning—and budgeted for a three-tier AC ticket on the way back, but found I liked it and returned the same way.

Accommodation in Delhi was free, courtesy friends. In Mcleodganj, we found a guest house with a good view of the valley for Rs 450, split between two occupants over two nights. I wanted to stay longer but the others had work to return to. I had no expenses other than food and transport.

Concerns with Apple’s business model

When Apple debuted the iTunes Music Store in 2003, I enthusiastically signed up and downloaded music. I had a check card with a US billing address that I made gleeful use of. I loved the store. And yet, something didn’t feel right. It took me a while to articulate what.

I was no longer in the US at that time and my checking account was rapidly depleting. Apple wouldn’t accept an international card. Their licensing terms with the music labels only allowed selling music within the US, they said. Fair enough, but something still nagged.

Apple made (and still does make) excellent computers and iPods, but selling music was a different game. It was no longer a one-time transaction for the hardware, but a regular, sustained interaction for your content fix. And US only. iTunes updates now came thick and fast, but my new Indian billing address was no longer welcome. I could only sit by and watch what I could have had access to. Meanwhile, the rest of the iLife suite and Mac OS X felt ignored while all attention went to iTunes.

I knew what was bothering me then. Apple was seeking a tighter and more direct, long-term relationship with their customers, but in the process ignoring anyone in a market where it was too much effort to set up a relationship. This wasn’t how it was with a Mac. Apple’s computers were severely marked-up in India at the time, but you could get one abroad or pay extra and get it locally. Beyond the barrier of price, you would get the exact same Mac experience as anyone else anywhere in the world. All the software there was for the Mac was available to you too.

This would have been a trivial ’plaint about music licensing, but 2007 rolled around and with it the iPhone, sold locked with a carrier contract, US only. Apple once again not just selling a fantastic device, but making the business deals that ensured a great user experience. Where they had no deals, you got no device.

As of Jan 2010, you still can’t buy music from the iTunes Store or buy an iPhone 3G S in India. Apple can’t work out suitable deals, so you as a customer are irrelevant to them. Meanwhile, you can still buy a Mac at a price that is now nearly the same as in the US, and all the apps you want for it are still available. It seems like Apple will have you as a customer only if (a) they can guarantee the quality of the all-round experience, or (b) are willing to abdicate that responsibility. There’s no middle ground.

And this is the crux of it. As Indians, we’re used to technology that isn’t quite right for us, whether it’s the address book that insists you split your initials into “first name, last name”, the app that wants dates in MM/DD/YYYY format despite your locale settings asking for DD/MM/YYYY, or in general software that is overpriced in US dollars, compelling everyone to use a pirated copy. It isn’t for us, but we use it anyway and step around the quirks. We’re cool with that. Now here’s an entity that essentially says, “this is very cool, but it’s not for you and we don’t know when it will be, so you’re not getting any of it.” That’s plain arrogance.

Apple has spectacularly bungled the iTunes Store and iPhone’s presence in India. Everyone agrees that they are due to launch a tablet later this month that will be more of the same, with the device’s experience tightly bound to content distribution. I bet they will bungle this too in India.

If there’s a weak spot in Apple’s business model for a competitor to take a stab at, this is it. But Nokia, that elephant in the room, has lost its mojo. If only Google regarded Android as anything more than an engineering wet dream…

Being an outsider

Last evening I sat across a physicist and a mathematician and watched them discuss clusterings of Wikipedia editors based on edit behaviour. Snatches of familiar but meaningless phrases hit my ears. Markov chains. Undirected graphs. Distances. Eventually the physicist squealed in delight and said she had won a bet with the mathematician. I nodded. Then they said “computationally expensive” and I took my cue and pointed out that for an extended period of revision history, one could take a given revision and consider that editor’s other edits only within a small window rather than across the entire period. That would cut clutter from the dataset and allow long term analysis. We only need to agree on what the window’s size should be. We could even come up with a way to identify a pair of editors responding to each other, as against working independently to contribute new material or clean up a page.

And thereby having said something intelligent, I sat back and watched their faces again, slipping back into incomprehension. We parted agreeing to keep in touch on the new ideas, but I’m at a loss to tell you exactly what the new ideas are. Their math makes no sense to me, for I’m an outsider: the chap butting his way into a discipline claiming to have some solutions, but with no understanding of the fundamentals.

The previous day I had a most fascinating conversation with one of the presenters at WikiWars, the significance of whose insight was again wasted on me. He talked of Edward Said and Satyajit Ray, of the latter’s biography on Wikipedia, the trouble with too many of the citations referring to a single biographer, and of how that could be understood in the context of Said’s work. He recommends Said’s Culture and Imperialism. I can feel the warmth from a dim bulb glowing somewhere.

He asked about me. I said I’ve spent the last few years in the rural development space. “Fooled around,” is more like it, for I went into the space armed with claims of pioneering web development experience and programming prowess, and found the most intense technical task they had was to install an operating system, open a web browser, point it at a government website, and explain to all parties concerned whose fault it was that the page wasn’t loading. Day-to-day life revolved around the size of the cash float, which investor was willing to fund it, scheduling meetings with the ISP for CEO-to-CEO face-offs on how a screenshot of our bandwidth consumption was insufficient, and visiting the very abrasive government bureaucrat to assure him that I did indeed have top-notch programmers working full time to bring him his daily report. Stick some Python in there to make it all better, will ya?

Which is why when I met the geeky young man working towards a PhD in agriculture, you will understand why I begged him to recommend a book that explained all this. There has to be some intelligence in this chaos, but I’m too much of an outsider to spot it.

I’m a programmer, I keep telling myself. I write code. Good code. Fast code. All these people waving their arms and speaking a strange dialect of English need me because, on the internet, code talks like nothing else. I can sit cluelessly around them, bewildered even, knowing that in the end someone will turn to me and ask if I can help.

Conversations move on. An hour later, at another location, the physicist says she’s working on a doctoral thesis. I say that nearly everyone in my life has a PhD or is working on one. I would have been too, if it wasn’t such a long, circuitous route. How am I going to justify trekking all the way through undergrad at this age just to get to the interesting bits? In academia, I’m the ultimate outsider. I’ve never been through any of their systems, turn up as this chap that no one is quite sure how to engage with, and yet have gained entry to more than one of their circuits and even published papers. The geek hat does carry one far.

The geek hat is also suspected. Bangalore’s ruined by the techies, they wail. I’ve been to endless meetings on problems that wouldn’t exist if they used Firefox instead of Internet Explorer, or something as trivial, except the Mozilla Foundation isn’t making an offer to fund a major e-governance project. I keep my mouth shut. People in the habit of routinely shooting at feet will eventually shoot their own, and then they won’t turn up at the next meeting. Suspicion of techies and the biases behind their ideas carries all the way into the realm of the bizarre. At a music concert one evening, this dear old lady, proud of her daughter who wrote for an advertising supplement, didn’t ask what I did. She didn’t want bad news. She simply said “don’t tell me you’re a techie.” A friend jumped to my defence, pointing to the camera and explaining that I was a photographer. I played along, for revealing that you’re a techie generally tends to make life more expensive in these parts, and I was foraying into yet another new discipline. A few years have passed and I’ve clicked much. Today I no longer wield a camera but still wear the geek hat.

At dinner last, the wikipedian from Taiwan made conversation. He had helped launch a minority language Wikipedia that the official system of language Wikipedias wouldn’t recognise and had successfully lobbied for its inclusion. He wanted to interview me for the wikipedians back home. As a local Wikipedia editor, how did I relate to the English language Wikipedia? But wait, me representing the local editors? With just a hundred odd edits on my account when the local chapter had editors with 50,000+ edits? I made the call to another (real) Wikipedian asking if he was in the neighbourhood. He suggested I go ahead anyway since I was a valid rep.

Later still, the Taiwanese wikipedian asked that fatal question: “So, what do you do?” I responded with the one-liner I reserve for such occasions. “I’m a programmer, I write code.” He pointed at my shirt. “You work for Yahoo?” No, I said, “that’s just a conference t-shirt.” I then attempted a weak explanation of my rural development stint.

The truth is, in the eleven plus years of my working life I’ve never worked at a software house, have never attended a computer class, and have no certifications. I wrote code through the ’90s, code and little else, telling everyone I was going to be a “software developer” when I grew up, and ultimately falling out of the academic system. But when it came to going to work, did I do the expected thing and join a software house? No, sir, I went into print publishing. What one does first sets the template, and this one sure did. I’ve put my foot into all manner of disciplines other than computer science, playing the saviour who produces the code, but bearing no certifications. I could afford it because I had put in my 10,000 hours already. After that much exposure, learning becomes automatic and incremental. I haven’t looked at a technology guide book in over a decade because I don’t need to. The book on my bedside today is on law. The one below it on film studies.

An increasingly ragged hat

My expeditions into new disciplines have gotten deeper and longer over the years, but they’ve also taken me farther away from the primary identity I’ve defined for myself. The last major piece of code I wrote was in 2002. Everything since has been relatively minor scripting. My open source code contribution track record is astonishingly sparse. I’ve gained proficiency at just one new programming language in the ’00s, down from five in the ’90s. I regularly encounter bewildering new technical constructs these days. It’s bad enough to feel like retirement.

I’m slowly, but surely, being ejected from the one discipline I considered myself an insider at. What’s one to do?

I suppose this is the part where life gets really interesting.

A year in recap

Long time, no post. So much to say, but where’s the time to write with all this activity? Remind me to post on:

  • What it cost me to take a year off,
  • What I’ve been reading through these months,
  • What I did with the time and how I ended up doing each, and
  • What I’m up to now, back here in the land of the gainfully employed.
View from my new office window
The view from my new office window.

Netbook theme for Ubuntu

Upgraded to Karmic last night. The refresh of the Human theme is quite nice, but the bright orange icons no longer work, so I made a quick remix. Download:

Both versions are designed for 1024×600 netbook screens. For best results, you should also install maximus and window-picker-applet, and setup a single panel at top containing the applet.

Installation

Go to SystemPreferencesAppearance and install from there, or better, extract the tarball to /usr/share/themes as root. The latter will get it to work for system applications too.

Unicode precomposition and decomposition

As a result of recent Mac troubles, I moved my iTunes library to a Linux file server and setup iTunes on my old TiBook to access the library over an AFP share using netatalk.

This worked unexpectedly well, until I noticed something very odd: I could no longer access any file whose name contained an accented character such as “é”. These files showed up in directory listings but were not readable. The filesystem complained that the file just did not exist. After a whole evening lost trying to find fault with everything from Mac OS X to netatalk, I found myself in unfamiliar Unicode territory:

It turns out there are two ways to represent certain accented characters such as “é” in Unicode, either using unique code points (U+00E9, “latin small letter e with acute”) or using a regular ASCII character “e” with a combining diacritical mark (U+0065, “latin small letter e” followed by U+0301, “combining acute accent”). The first form is known as “precomposed” and is the standard for filenames on Linux, while the latter “decomposed” form is standard on Mac OS X.

The Mac approach is unusual but has the advantage of making accent insensitive search easier. A string search for “cafe” will also match “café” because the last character is really two; “cafeteria” can match for “caféteria” if one simply strips out diacritical marks. Doing this with precomposed strings is much harder. (Thanks to @deepakg for identifying this.)

Mac OS X enforces the decomposed form for filenames, but Linux doesn’t. On Linux, precomposed UTF-8 is expected but not enforced. The netatalk AFP server recognises this difference and transparently translates filenames between what it calls UTF8 and UTF8-MAC. This is where I ran into trouble. I had transferred my files using rsync and ended up with decomposed filenames on Linux. These showed up fine over AFP, but when Mac OS X attempted accessing them, netatalk did the transparent translation to precomposed names and could no longer find the files. The solution? Rename all files on the Linux side:

convmv -r -f utf-8 --nfd -t utf8 --nfc ./* --notest

And in future, when rsyncing files from Mac OS X to Linux, ask it to translate the filenames with this additional option (reversed for Linux to OSX):

rsync --iconv=UTF8-MAC,UTF8

Dead Mac

My Mac’s display died without warning one day last month. I was using it when blocks of randomly coloured pixels appeared on screen, obscuring the display. Rebooting didn’t fix it, nor did turning the power off to let it cool several hours. One of the internal fans had flaked out earlier and was in the habit of refusing to spin up every once in a while, so I suspect a burnt chip from overheating. I can no longer see enough of the display to boot up and login, but the machine continues to be fully functional when accessed over the network.

Dead Mac

Being occupied with other things, I put the machine aside a few weeks and finally opened it last weekend to take a look inside. The right side fan was dead. It appears have to lost its magnetic charge as a result of my previous attempts to clean it and no longer spins comfortably when flicked with the finger. I couldn’t tell what was wrong with the graphics, however, so decided to call Apple Support. The machine is a little over three years old and well over any sort of warranty period.

Mac Guts

Apple’s phone support directed me to the Ample Imagine store at Forum Mall in Koramangala. They said they’d have a look and tell me if it was fixable, but would charge Rs 750 for the inspection. I agreed and left my Mac with them. They called back yesterday and said they’d have to replace the logic board at the cost of Rs 35,000. Given that this is nearly half the price of a new Mac, I decided to save my money and use the machine as a display-less network server. This should have been the end of the affair, until I went in today to pick it up and noticed the job sheet:

The engineer’s comments said he had tried resetting the PRAM and connected an external display, but since that didn’t fix it, he had decided it was a logic board problem and suggested I get a new one.

What? That’s it? A diagnosis costing Rs 827 (750 + taxes) without even opening the machine? For all I knew, some chip could have had its soldering melted and come loose because of the overheating. It could even be just a loose connector on the board. Who trains these guys?

Now, there’s something to be said about this particular model. My Mac is a first generation Intel and unlike all Apple laptops that came before and after it, this one is not user upgradeable. It can’t be opened without literally cracking open the case, a process which leaves visible scars in the front, below the trackpad. I went through this process two years ago when upgrading the hard disk and spent over an hour gently tugging and wriggling a screwdriver to pry it open. An Apple engineer not aware of this history should have called me to confirm he could do this because of the risk involved, but no one called. There’s no way anyone could have opened it and failed to record that in the job sheet. They quite certainly didn’t.

This incompetence is appalling. I feel like I’ve been scammed of my money. These engineers seem to be trained to make diagnoses for machines within warranty, but not for anything requiring a real examination. Dear Apple: if you want to be a serious contender in India, you had better get your act together.

And for what it’s worth, I’m now on my own trying to get this fixed. I suppose I could start ordering parts off eBay and try my luck with guessing exactly what is broken, but it would help to have (a) real expert diagnosis and (b) a way to avoid wrangling with Indian customs when importing parts.

Do you know anyone I should be talking to? Or, know anyone with a dead MacBook Pro of the same period (Intel; pre-Unibody) who’d be willing to palm it off to me for spare parts?

I’m not as badly off as I could have been because I made a serious habit a couple of years ago of backing up everything, including having backup machines (currently an ASUS Eee PC 1005HA running Ubuntu Jaunty), but the machine’s absence is clearly felt, and I don’t have the budget for a new Mac until next year.

Disabling the alarm on APC UPSes

UPS Alarm
From the wonderful Fly, You Fools! webcomic by Saad Akhtar. Read the full strip.

You know what I mean. What were they thinking? Here’s a helpful explanation by an APC employee:

I understand your concern with not wanting to be woken up at 2am to be alerted that power has gone out in your residence. I use the software at home to disable the audible tone as well, however, I think taking a look at it from a different approach may be ideal. Is the UPS your source of power for your alarm clock in the morning? What would occur if you were to have to wake up at a specific time during the week, and your alarm clock, which is not powered by your UPS, powers off due to a blackout, even if it is momentary? I think it would be ideal in this scenario that the UPS wakes you to notify you of a power failure. That would allow you to possibly find an alternate source of power for the alarm clock, or, if power is to be restored within a reasonable period of time, to reset your clock so that you wake up on time.

Right. That’s why. That horrible shriek is meant to wake you up. If, like all real people, you have an alarm clock that runs on batteries and prefer a full night’s sleep, it turns out that you can disable it. This works on most common APC UPS models with the USB cable. Windows users should install APC’s PowerChute software. It apparently has an option somewhere to turn it off. On Linux, the apcupsd package will do it for you (make sure to plugin the USB cable first):

Read on...

What’s happening to our online communities?

Supriya Thanawala of the Hindustan Times wrote in asking if I had noticed how online community spaces over the years have grown to discourage pseudo-anonymous identities. I responded noting several trend lines:

  1. Internet adoption is growing, making governments increasingly more conscious that this is a new space they ought to be governing. That’s where the cyber cells and ISP IP logging come from.

  2. Any medium where an individual can be reached with little effort will be misused. Postal mail has junk marketing, telephones have telemarketers, email has spam, each cheaper than the previous. As the medium grows and becomes a worthwhile channel for junk messages, service providers come under increasing pressure to keep it usable for normal users. They do this by either requiring some real life id (such as by your ISP) or by limiting your use of the service (such as mailing list providers that limit the number of people you can directly add to your new list).

  3. The web is a public medium. Anybody can see anything posted there. The web is also very large, so resource discovery, and not access, becomes of primary importance. Blogging became popular because of this curious nature of the web. A blog was both private because nobody would find it until they got referred to it somehow, and public because you could always share the link. Online spaces felt like intimate communities in the early days because there were so few people online and you either knew who they were, or guessing that became an interesting game. As that count grew, partitioning spaces becomes important. Today’s Facebook is more or less private. You decide who your friends are and only they can see what you write. The rest of the web can’t.

  4. Early blog+social networking spaces like LiveJournal and Friendster have been grappling with anonymity and fake identities for long. Here’s something I wrote a few years ago. Some have attempted banning them outright, while others have tolerated them but ended up with mixed results (see this for a particularly entertaining example – those profiles originated in a very non-funny flamefest elsewhere, after which their makers decided to keep them going for a while). Facebook has taken the more pragmatic approach, allowing for the creation of “pages” distinct from profiles that users can interact with.

  5. Facebook arrives at a time when the web is increasingly seen as having little direct revenue value. Money is made via advertising, not from users paying up (in contrast, LiveJournal was profitable for several years because users paid for accounts with extra features; Flickr runs on the same model). The Pages feature on Facebook is largely seen as a marketing vehicle for a film or a product that users pay for off the web. This brings in marketing language, sanitised humour when there is any (notice that TV sitcoms are never as funny as the spontaneous writing of the Aaj Sholay community), and a referral to everything by a real world name in a manner that respects trademarks and copyrights.

So where is all the anonymity and creativity going now? It exists as always; it’s just out seeking new corners for itself away from the public eye.

(I suspect some of this isn’t quite true anymore, but I haven’t been thinking about it. Your thoughts?)

Why so jobless?

Years ago, at my first job, I attended an annual day talk by the founder and chairman of the group of companies. The man had started from scratch and built a 200 crore conglomerate with a range of business interests. This was supposed to be a pep talk about what visions he had for us, the newest company in the group, but he couldn’t help starting with a little about himself. His greatest pride in life? That he had never had a job working for someone else.

Well, so much for holding on to mine. Related reading (via @thej).

Analysing Wikipedia: caching data

I haven’t posted about Wikipedia in a while. Hans went to Ladakh right after I returned, so we’re only now getting around to analysing the data we collected in July.

Our biggest hassle with doing any kind of analysis is with how long it takes to retrieve data. A full text analysis of a few hundred revisions of a large page could easily take an hour to pull. If that analysis doesn’t produce satisfying results, attempting a variation requires pulling that data all over again, because we have no cache.

I use the mwclient library, which provides a thin wrapper representing MediaWiki queries as lazy (?) Python sequences. Since this sequence could be cached, I’ve been considering strategies (some of this assumes familiarity with mwclient):

  1. Implement a simple Python dictionary cache around the mwclient API, saving the query→result mapping as a pickled dump and consult that before hitting the servers again. This is easy, but since the sequences are lazy, the data isn’t available for caching until the code tries to access it. The cache has to intervene then. All my analysis code must now be written for two API’s, mwclient’s and my cache’s.

  2. Alternatively, do the same thing but as a patch to mwclient’s code so there’s a single external API. This requires understanding how it works and maintaining patches against upstream changes, which takes time away from analysis.

  3. Do it outside. Setup Squid or another caching proxy to cache everything regardless of HTTP headers. Make queries through this. Easy to setup, but grossly inefficient. Proxy servers understand request→response mapping, not sequences. If I ask for a subset of an earlier sequence, it’ll treat it as a new request. Sequences require special treatment:

    • There could be newer edits on the site, making the sequence’s beginning and end markers stale.

    • A new query may ask for overlapping results (typically, a query from a fixed starting point to current time). The cache should be able to join sequences instead of duplicating data.

    • A query may ask for the same time range as an earlier query, but with additional properties (typically, the full text of each revision). These additional properties should be merged into the cache.

  4. Drop this approach altogether and get a static dump of the Wikipedia database. But a full text dump of the entire revision history of the English Wikipedia is 150 terabytes. The resource requirements will take us out of the realm of a hobbyist project.

Given that data retrieval time has become a serious hobble, it seems worth tackling this head-on. A custom cache API could:

  1. Be sequence aware. Treat each MediaWiki article as being a sequence of unknown start and end, of which fragments are available in the cache. Join sequence fragments as data gaps are filled in, leading to one single sequence for the page’s entire revision history.

  2. Store additional properties on each revision. MediaWiki does not store diffs between revisions, but the cache could, since much full text analysis is based around the changes introduced by each revision. Properties could also be flags marking pages as, for example, vandalism, or the following reversion.

  3. Based on the above, store alternate sequences and properties specific to them. For example, a revision sequence of an article that skips all vandal/reversion revisions and stores edit diffs without them. Without this, an editor whose sole contribution was to revert vandalism will come out appearing to have added a lot of new material.

A web service implementing this API will, over time, be able to respond to queries in near real time, making it possible to build a web interface where anyone can submit a query. The public web interface is one of our eventual goals for this project.

I’ll post updates as I work out the technical architecture for this API. I’m considering using one of the newfangled key/value pair databases but have no experience with them. Recommendations are welcome.

Upper Dharamsala on a rainy day

I found myself in Mcleodganj last May, in the company of TB Dinesh and Guillaume Marceau. Dinesh wanted to pick up some luggage a friend had left behind in nearby Dharamkot, so off we went up the hill.

Man, was it hard! The incline could have killed me. I was out of breath and my feet ached. My camera bag felt like a huge burden. I had to stop for breath every turn and rest minutes. When we finally reached Dharamkot, I refused to leave the tea shop for the next couple hours. We hung around ordering several rounds of tea and snacks. Dinesh and Guillaume then wanted to walk further, so I reluctantly tagged along. The body ached but the mind couldn’t refuse the challenge. We walked all the way up hill, past prayer flags in the woods, past a shrine to the earlier Panchen Lama, the current Dalai Lama’s late teacher, up to the top, and down again through the Tibetan Children’s Village, along the water pipeline, back to Mcleodganj.

TB Dinesh and Guillaume MarceauSanjay's, DharamkotHike in the HillsTibetan Prayer FlagsTibetan Children's Village, Upper Dharamsala

I had cramps the next day. When I returned to Bangalore and checked my weight, I was down two kilos. In a day’s walk.

And so, a year later and halfway through this year’s resolution to improve health, I had to check again. Was it really so bad, or was I just so out of shape? Has all the cycling in Bangalore and walking in Ladakh’s thin air helped at all?

It has: the walk this time felt like a casual stroll through the woods.

Upper Dharamsala on a rainy day
From atop the hill overlooking the Tibetan Children’s Village (off to the left).

Nikon D70 + kit on sale

I’m putting my Nikon D70 camera and entire kit on sale. It’s served me well over the years and I’m now ready to move on. Here’s the kit contents, what they cost me, and what they’ll cost you if you buy new or individually second hand from eBay (some prices are approximate guesses; all prices are in Indian Rupees).

ItemPurchasedOriginal PriceCurrent NewSecond Hand
TotalRs 85,335Rs 48,680Rs 35,700
My OfferRs 25,000
Nikon D70 body200447,500NA (11,500)11,500
Nikon 50mm f/1.8 D20045,0006,0004,000
Sigma 18-50 f/2.8 EX DC200721,00023,00015,000
Nikon IR remote ML-L320051,000630500
52mm circular polariser20041,0501,050800
67mm circular polariser20051,4851,4001,100
Sandisk 2GB CF card20044,000800500
Sandisk 2GB Extreme III20072,3001,300800
LowePro Photo Runner bag20052,0003,0001,500

Extras: The D70’s rechargeable battery and charger, a spare higher capacity battery, a lithium cell holder with three CR2 cells, 52mm rubber hood for the 50mm lens, 67mm petal hood and carrying case for the Sigma lens, spare 128 MB CF card and card holder, a compact CF card reader, USB cable, UV filters on both lenses, rubber blower and nylon brush for cleaning the CCD, and a lens filter holder with space for six that fits into the LowePro bag.

If you were to buy all this second hand one piece at a time, it will cost you ~Rs 35,000. My offer price: Rs 25,000.

I took the D70 to Nikon’s service centre late last year to fix accumulated wear and tear. They replaced its power switch, CF card slot and rear-side outer body at a cost of Rs 5220. The camera’s traveled only twice since. As a result of this, it is in much better shape than it would have been for its age.

I’ve used this camera for practically every picture I’ve made in the last five years (barring July’s Manali and Ladakh trip). If you’ve liked my pictures in the past, this is the equipment that made them possible. Here’s a sample gallery.

Buyer must collect in person. I’m currently in Mcleodganj, Dharamsala, carrying everything except the polariser filters and will be in New Delhi later this week before returning to Bangalore. Questions? Leave me a comment.

Update: Sold!

QWERTY-be-gone

Much of the debate around modern mobile handsets is around the text entry mechanism. If you’ve gotten used to a device with a QWERTY pad, will you be able to go back to T9? How can anyone touch type on a device with an on screen keyboard? Will haptic feedback make them just as usable?

All these debates (except around T9) make one fundamental assumption: keypads can only use the QWERTY layout. This is where one must take exception. QWERTY is a 140 year old standard with a seemingly random layout of letters that were arranged to avoid mechanical jamming in the technology of the 1870s. Generations of typists have grown up with memories of their baffling first encounter with the layout, something they had to learn because that’s the way it’s always been done.

To shrink that same layout down to under three inches and call it state of the art is just bizarre. Keyboards are from an era where the technology of the day demanded a two dimensional layout. We’re no longer constrained by that technology. Look at your fingers, folks. Look at how amazingly dextrous each of them is, how capable of independent movement each is. Look at how you can hold a pen to paper and make coordinated muscle movements across fingers to write. Keyboards take no advantage of this ability. A keyboard is a flat, rectangular layout with a key for each symbol, where your only possible interaction with that key is to press it down.

Rather than shrinking that rectangular layout, why not change the possible interactions with each key? How about if you could both press down and up? What if you could record interaction with every joint in your finger, instead of just the tip?

Chorded keyboards and keyers have been around for decades, but have failed to gain mainstream acceptance because (a) there’s a learning curve, and unlike the learning curve of QWERTY, there’s no incentive to scale it, and (b) as a result, there aren’t enough users to establish a standard from among the competitors.

But for the first time in the history of digital text communication, there are now more people who use their phone as their primary means of communication than a regular computer. The time is ripe for QWERTY’s mobile successor to be born.

Being offline

I spent most of July offline, travelling, for the most part in Ladakh. It’s hard to miss the internet in a place like this:

Contemplating the Zanskar
At the confluence of the Indus and the Zanskar.

The experience was so relieving that I’m considering spending a few more months doing this – travelling and staying offline.

Seven and a half years of Evolution

To prepare our next analysis, I parsed the Evolution page’s entire revision history for individual words added and removed. The first available revision is from December 3, 2001, making that just about seven and a half years worth of revisions.

Here’s the raw data file, 4.8 MB bzipped, expanding to 76.4 MB. Content format: UTC Timestamp, Revision Id, User, Add/AddStems/Del/DelStems, List of words…

The data includes both words and their stems. The stems are calculated using the Porter stemmer, without semantic context (background reading). Letter case has been preserved since I have no means to distinguish between proper nouns and sentence-beginning capitalisation. To get the list of words, I start with the article’s raw text, strip it of HTML tags, tokenise it by alphanumeric characters to get a stream of words, and then diff that against the previous revision’s word stream (the same algorithm as diff -u on the command line). A displaced word will thereby show up as both added and deleted. The tokeniser isn’t perfect: the word “isn’t” will be broken up into “isn” and “t” since the apostrophe doesn’t count as alphanumeric. Suggestions on how to make a better one appreciated.

Here’s the code if you’d like to try this yourself. You’ll need the other modules in the folder, the NLTK library, and the mwclient library.

Analysis to follow.

Vapour and vacuum

If you release a litre of water into the vacuum of outer space, what will happen to it?

It will vapourise instantly, just as a compressed aerosol at Earth surface pressure, and in the process cool down far below freezing point. What happens to the molecules then?

Do they float away as free molecules, no longer ice? Does the crystalline structure of the ice hold them solid? Or if that is too late or not strong enough, does gravitational attraction pull them back together? Will Earth’s own gravitational pull be strong enough to bring them down?

Pictures from #socmob

I’ve posted some pictures from last month’s discussion on using social media for mobilisation, with Dina Mehta and Peter Griffin at CIS. Here’s the report and earlier Twitter feed.

Nothing significant; just some faces. Helping with attaching names to faces appreciated.

Updated CIS website

I spent the last two weeks cleaning up the website for the Centre for Internet and Society. Check it out and let me know what you think.

Book signing

“Do you read fiction?” I asked Manish.

“Huh?” he stammered. Only minutes before, I had asked if he could write Python code to generate the Fibonacci sequence, my standard test for recruits. He was trying to work that out and I was growing impatient.

“Um, yes…” he tried to answer, but I wasn’t listening. I said, “There’s a book reading at Crossword in about fifteen minutes. Let’s continue there.”

Amitav Ghosh was in town to promote his new book Sea of Poppies. I had been seeing his books on shelves for years, but hadn’t read any, being generally sceptical of Indian authors. Many years back, when each new book cost me months of savings and days of careful consideration, I had on occasion hazarded a technical book by an Indian author, and inevitably ended up bitter. For all their cover promises, the books were always fluff.

Amitav Ghosh is good, Zainab said. But Indian fiction in English? Admittedly, I hadn’t tried any. Couldn’t hurt to try, given I can afford to buy and not read a book these days.

And so that evening, I interrupted the interview and took the candidate to a book reading, asking him to think out the code and dictate it to me later. Ghosh read an excerpt from his book and discussed it with his host. I hadn’t been to a book reading before and didn’t know what to expect. When the discussions ceased and people queued up to get their books signed, I joined.

At my turn, I put two books down on the desk. Ghosh opened one and looked up expectantly, then said “Who’s it for?”

“Huh?”

Who’s it for? For myself? I was picking a copy for myself. Who could it be for?

“For Kiran,” I said.

Wait, that sounded wrong. Someone was missing. Someone who should have come first. “…and Zainab,” I hastily added. “For Kiran and Zainab,” he wrote.

And that was how I brought home my first author-signed copy and ended up apologising for it.

Chandrahas Choudhury was in town this evening for his new book Arzee the Dwarf. Zainab said to say hi. She knew him? Well yes, through the Mumbai blogger circuit. I joined the queue and, when my turn came, offered a reminder of our brief meeting in Manipal last year. “Of course,” he said. “Where’s Zainab? I’m going to write this out to her too.”

“To Kiran and Zainab,” he wrote.

This post intentionally left blank

There was going to be a post here, but my browser ate it up and I’m now too mad to be writing it again.

Some things have incredibly steep learning curves, but we struggle over them anyway, because on the other side of the curve we get our *-fu master black belt. We go through life collecting and exhibiting our belts. Every once in a while we come across someone with a belt that makes us envious, that won’t get off our minds, and yet, when it comes to facing that curve ourselves, it no longer seems worth it. Why is that?

Rank

“Wait here,” said Srinivas, and disappeared from view before I could turn around.

Behind me, vehicles honked as they approached the narrow intersection. I pushed the bike to the edge of the road, parked, and swung the backpack over to my back. Where had he gone? The building behind me looked busy. I walked over and looked up the steps into the corridor. No sign of him.

The guard rattled his cane and said “What do you want?” Something about his tone put me off. I hate it when people question the authority on which one exists as they do. I was standing on a public road where I had every right to stand. What was his problem? And where was Srinivas?

“This is a ladies hostel,” he said. “Go away from here.” I looked up again and noticed for the first time that every one of the persons entering and exiting the building was female. This was somehow supposed to be my fault? Who did he think I was, a college romeo? The backpack! Did he… oh dear… really think I was a student?

“I am thirty years old,” I wanted to say, “and married.” Why should I care that this is a ladies hostel? But damn it, he didn’t deserve to know that. What business was it of his? I had had my share of being lorded over by petty officials back in my school days. I was going to have none of it now. I was not going to be sorry for who I was just because some two bit minimum-wage guard had an inflated sense of his own importance.

Who did he think I was? My mother had been a founding principal of one of their schools, and had run it for ten years. I had grown up riding down this very road through their gates to pick her up every evening. I would park my bike in the staff parking area and walk into the principal’s office, unchecked. And now, I was the suspicious character? The gall of it!

I said nothing. How was I to compress all that into a single, coherent statement? One that said, in addition, that while I had nothing against him personally, he ought to know better than to insult someone with such impeccable credentials? That if he dared make a move, I was perfectly capable of pulling rank?

He continued glaring at me. I shrugged and walked back to the bike, pretending not to have noticed. Srinivas returned several minutes later and announced that there may be some houses in the next block. I wanted to tell him of what this place meant to me, nay, of what I meant to this place. The ego had to be soothed. But I said nothing, and we resumed our house search.

(Part of a writing practice series.)

Charting languages

Guillaume Marceau, who made a guest post here on how to make comparison charts, has an excellent demonstration of this technique over on his blog, charting performance against code verbosity in programming languages:

The speed, size and dependability of programming languages

If you drew the benchmark results on an XY chart you could name the four corners. The fast but verbose languages would cluster at the top left. Let’s call them system languages. The elegantly concise but sluggish languages would cluster at the bottom right. Let’s call them script languages. On the top right you would find the obsolete languages. That is, languages which have since been outclassed by newer languages, unless they offer some quirky attraction that is not captured by the data here. And finally, in the bottom left corner you would find probably nothing, since this is the space of the ideal language, the one which is at the same time fast and short and a joy to use.