Wednesday, December 16, 2015

Go Ahead - Miss the Deadline

It’s 8pm on a Friday. You have to submit a set of documents to an important client by midnight - on a project you are managing - and one of your team members has done a shoddy job. Great, you think, now I have to do it. Oh well.

Before you decide to skip dinner and the next episode of The Good Wife, stop. It’s tempting to fill in their gaps in order to meet the deadline, but don’t do it. It’s Not Your Job.

I’m not advocating the abdication of responsibility. What I’m saying is that as a manager, your responsibility is getting your team members to do their work. It is a common philosophy that since the manager is responsible for delivery that they have to get the delivery done when all else fails. And maybe this is sometimes true - but it’s the wrong way to work.

A manager’s job - your job - is to ensure that the work gets done. This means

  • Splitting up tasks across team members according to their skills and experience
  • Tracking their progress and the quality of their output
  • Making necessary corrections far in advance of the deadline - through reassignments, training, or outright removal of non-performing team members

I’ve seen organizations where the managerial culture is to get the work done no matter what, where the manager is the worker of last resort. Managers took pride in being the stalwarts who ensured that projects got delivered. The people on whom the organization could depend. And that’s what they were - every single time.

An organization with this culture will not scale. And managers who spend their time doing their team members’ work will never learn how to break out of that loop and manage their team into better performance.

This culture isn’t good for your employees either. The lazy ones will learn that they can get away with non-performance, and the ambitious ones will wonder why they are failing and become demotivated.

It’s 8pm on a Friday. Don’t finish the deliverables. Instead, prepare your report of why the deadline was missed and what steps you will take to ensure that it doesn’t happen again. Identify the mistakes which were made, for example

  • “We provided a time-and-effort estimate before understanding the spec”
  • “One of our team members transferred in from another team and had to be trained”
  • “We acted on the feedback from the Quality Team but failed to recognize that all quality issues were with this new member”
  • “We failed to reallocate this person’s work to the other team members”

Because what your customers really want, and what your bosses want - whether they know it or not - is predictability. It’s okay if one project is delayed as long as you can give a reasonable data-backed guarantee that your future projects will always be delivered on time.

Saturday, February 7, 2015

How To Think About Programming: A Data-Centric Point of View

A Data-Centric Point of View

I'm going to start by trying to convince you of the data-centric point of view, how it is useful and why it should be your primary point of view when building computer systems.

Focus on the data. This could be
  1. The source code which a compiler looks at
  2. A list of strings to be sorted by a library function
  3. A collection of invoices in an accounting software
  4. A video travelling over the internet
  5. A configuration file for a web server sitting on a filesystem
You get the idea - I mean data in the most general sense, as they exist in IT systems - in software, in hardware, through the network cables, all of that. I could go even further and talk about the data outside IT of systems, but I won't do that today.

Now all operations on data can be put into two types:
  1. Transportation
  2. Transformation
Transportation refers to any change of location. This could be copying bytes from one buffer in memory to another, or from memory to a register in the processor. It could be the transfer of a video file across a network. It could be from a database, through an application to an output device.

Transformation is harder to define, but easy enough to explain with examples. Think of sorting an array of numbers, or of tokenising a string. Think about adding TCP headers to an HTTP packet, and IP headers on top of that. Think about adding up the elements of a list of numbers.

At a low level, transformation is always expressible in terms of arithmetic operations. How do you find the sum of an array of numbers? Well, you maintain a running sum in one memory location, and an array index in another. Each iteration increments the index and updates the running sum.

Aside: Levels of abstraction

An operation without side effects is essentially a no-op. In order to know whether or not there are side effects, one has to restrict to a certain "scope". Here's an illustrative example, consider the following Python code:

    if a < b:

If your scope is low enough, there is plenty of transformation as well as transportation happening:
  1. The value of a is copied from memory into some register on the chip
  2. As is the value of b
  3. A subtraction is performed
  4. The sign of the result is observed
  5. An instruction pointer is moving throughout this process...
And so on...

But if you move to a higher scope, this operation does nothing, and it may as well not have happened!

Your questions have different answers depending on the "level" at which you look at them. To decide which level your should be considering, you have to decide what you want to be thinking about. The example above is valid if your focus is on what happens on the chip. If you don't care about that, the code is equivalent to a no-op.

Here's an example at a much higher level. Suppose you're trying to insert a row of data into a relational database, and that this particular row violates a primary-key condition. You send the data across the network, through a bunch of routers and firewalls, through network interfaces and finally into the DB engine, which rejects the transaction and returns the error code.

Lots of stuff happened! But from the point of view of the data in your RDB - nothing happened, because nothing changed. The initial state is the same as the terminal state. You may as well not have sent that data at all because it made no difference.

Ok, enough about levels, you get it! From whatever point of view you want to reason, you can identify how your data moves and how it changes.

Don't think about your programs

Let me being with a term: Architecture. As a beginning programmer, I was fascinated by this. Architecture. It sounded so mysterious, so exciting, and so out of reach. How did people reason at that level? I was smart too, what was I doing wrong?

Here's my attempt at explaining how to do this: think about your data. Think about, and visualise, what it looks like, where it comes from, where it goes, how it changes. How it splits up and recombines. If your program reads a set of numbers, puts them into a binary tree and serialises it for transport, visualise these operations. If you accept input from a web form and need to validate it before storing it in a database with high availability and replication, think about how your data needs to move in order to achieve this.

Don't think about the programs which make this happen. Think about what is happening. Think about the data.

Once you have a picture of the data flow (transportation & transformation!), then you can think about how the computation has to happen.

Ok, now think about your programs

All transportation and transformation is enabled by computation. Again, let me explain this using examples at different levels. In each of the following examples, first ask yourself where the data is, how it is changing, and where it is going. Then ask yourself where the computation is happening.

Example 1

printf takes a format string and a list of data. It combines these two to produce another string, which is then sent to standard output.
  • Think about this as a "service" being offered. Much like a valet service at a restaurant or laundry service at a hotel; you submit a request and something gets done.

Example 2

From within my program I invoke a function called foo(). This function accesses a cacheing service to get the name of the latest version of a file on my HDFS. It then asks HDFS to read that file, after which it can extract the last two lines and parses out the URL and the integer 691. foo() then calls another function bar(). bar() packages the integer 691 inside a JSON object, which it sends via HTTP to using the networking stack on that machine.
  • This about this as a service calling a bunch of other services.
  • Function calls are service requests! A function call takes parameters, and may return a result or have a side-effect. Or both.
  • API calls are service requests!

Putting it Together

At each level of abstraction, and don't mix levels unless you want to end up confused,
  • Think about your data - their creation, modification, transportation and eventual destruction.
  • Then think about the services (or "operations") acting on this data. These could be function calls within a program, or calls to an external library, or even across computers over a network.
  • Find out which of these services are already available for reuse
  • Build the remaining services i.e. maybe write some code. To do this, level-down and repeat this process.
  • Then design the framework which contains all your logic i.e. all your service requests.
(This would be a good time to say that if you're designing a system, you should require that every operation performed on your data have an associated authorisation... this could even default to "allow everything", but in your mental model it should be there).

Sunday, September 14, 2014

Don't read Design Patterns

Consider the following code:

1:  int x,y,z;  
2:  str a,b,c;  
4:  x=0;  
5:  a="A0";  
6:  print(x, a);  
8:  y=x+1;  
9:  b="A1";  
10:  print(y, b);  
12:  z=y+1;  
13:  c="A2";  
14:  print(z,c);  

and compare it to:

1:  for (int idx=0; idx < 3; ++idx)
2:      print(idx, "A" + str(idx))

Both do exactly the same thing, but anyone who has been writing code for more than a few months knows that the second version is vastly better. And if you ask us why it is better, we will tell you that
  1. It is easier to read
  2. It is easier to debug
  3. It is easier to modify
All of these are correct and fall under the umbrella of maintainability.

We are trained very early to write code which is easy to maintain. This means that other people should be able to understand it, it means that we should be able to understand it when we look at it a year later, it means that the clever bits should be easy to reuse. All worthy goals, all of them eventually translate into time and money, and all have to do with software development and management.

Many developers continue to explore this aspect of programming, and study class design, and interface design, and design patterns. People will engage in debates on underscores vs. camelcase, or on composition vs. inheritance.

It is not the contention of this post that these discussions are without merit. They are the process-aspect of software development, and processes are important. Standards are important. It is important not to make mistakes, which could have been avoided by standing on the shoulders of those who came before.

But sometimes we forget that we write programs to do things. When I was a younger programmer, I was just so excited that there were so many ways to express my ideas, so many ways to design my functions, and my classes, and my class templates, and oh-look-isn't-this-the-command-pattern.

This post contains my view of how a programmer should develop, heavily biased by my own experience and my own mistakes.

Stage 1
Programs are mysterious and esoteric. Error messages from development tools are frightening and make no sense. But working on the command line feels cool. You hope that people will notice the code on your screen and be in awe of you.

Stage 2
The compiler / interpreter works. You understand that the program has an entry point, does some stuff, and then exits. You understand control structures and you know that your functions and variables should have descriptive names. And you're learning that debugging is a huge pain.

Stage 3
You can use libraries and modules written by other people. You have become comfortable with incomplete or incorrect documentation, and can get your programs to do what you want them to. Debugging is no longer a pain and you see it more as a challenge. But you still spend too much time doing it. You view your programs and a collection of classes and functions, and understand the relationships between them. You read Design Patterns and start looking at your own code to find patterns (patterns!) and update your resume accordingly.

Here's some advice: Don't read Design Patterns. Not as a young programmer anyway, not unless your job demands it. Don't read about class design in general, the meaning of protected inheritance, or about when virtual functions should be private.

Instead, focus on making your programs run better. No bugs, tight execution, compact memory footprint etc. Yes, RAM is cheap nowadays, I'm not trying to encourage cost savings here. But I think it is more useful for a programmer to consider the run-time aspect of their program rather than its static nature.

Stage 4
You have developed a data-centric viewpoint, i.e. you know what the data looks like, where it comes from, how it is modified, where it ends up. You know which data is short-lived and which is going to be around for a while. You can shift your viewpoint of a program back and forth between "procedures passing data to and from one another" and "data being acted upon through its lifecycle".

Let me talk about these two viewpoints for a minute. Viewing a program as being made up of entities passing messages to one another is very useful. We tend to talk of those entities as being "objects", and these objects usually do things i.e. they are actors.

But objects can also be just data (although some people frown upon objects with only state and no behavior), and one can view their programs as being collections of objects in different stages of development. In this viewpoint, the actors are de-emphasized and the focus is on the objects being manipulated. For example, one may talk about how a string field in an object is converted to uppercase - with no mention of the TextConvertor which derived from a StringManipulator and created by a DataModifierFactory.

You may also start looking at data management libraries and services, like caching servicesservice buses, and message queues. You wonder if you are dealing with Big Data, and put it on your resume anyway. And you can often debug problems in your head because you don't have to look at the code to know what it's doing.

Stage 5
Now that you view your programs primarily as runtime entities, you can come back to the static viewpoint. Now when you think about class design and modularization, you will have the runtime model in mind - first and last. And now you will create beautiful designs, engineered for performance as well as for manageability.

Thursday, September 4, 2014

Online security and Uber

Credit card theft

It seems like credit card data gets stolen all the time. In 2011 Sony made the news with details of over 12000 credit cards being stolen (some reports claimed 2.2 million, I don't know what the final numbers were). Then from March of this year, the Target breach - in which 40 million numbers stolen. There were breaches in between of course, which one could look up if one were so inclined; I'm not taking the trouble since I'm pretty sure everyone reading this knows that it happens, and happens regularly. I just typed "credit card theft" into Google Search and found this story, from yesterdayNearly all US Home Depot stores hit.

How can credit card data be stolen? Here are five ways. Here's a news report on YouTube. Most of us have seen videos like this one before. And when someone steals card data, they can sell it on sites like (not hyperlinked, I'm not sure that going there is a good idea).

So how come this doesn't happen to us sitting in India?

Credit card use in India

In India, having a person's card information is not enough to use it. Even having the physical card itself won't work - because we always have to go through an additional authorisation step.

When we use a physical card in a physical store, we are required to enter a 4-digit PIN number. I remember getting my replacement "chip"-style cards a few months ago, and every merchant I visit has the new machine which reads the chip and asks for my PIN. This method prevents your card from being usable unless the thief also managed to steal your PIN number - and ATM and debit cards work the same way. I don't know what happens if the thief just tries all the numbers from 0000 to 9999, maybe your bank's fraud detection software kicks in.

Online transactions work similarly, when we provide card payment information to a site we are required to authorise the transaction on a different site, showing MasterCard SecureCode or Verified by Visa. This other site is hosted by your bank, and this is important because this way the merchant cannot get your security passcode. Of course it is entirely possible that your bank could get broken into, but it is easier to regulate and monitor a few hundred banks rather than a few hundred thousand online businesses.

Aside: When we transact using Net Banking, we often have passcodes for individual transactions - I know that at least AXIS and Kotak do this - after you initiate the transaction they send a code to your email address or to your phone via SMS, and you have to provide this code to complete the transaction. I wish we had this for card-based transactions as well.

In summary, stealing credit card information in India is basically useless.


Well, unless the merchant is outside the country. Our regulators have no sway over outside merchants, and while they could certainly block us from transacting with them, that would be hugely inconvenient for us. So when we transact with foreign merchants, a credit card number and expiration date are enough. And probably the name. No other authentication, no passcodes to be entered on a trusted site, nothing.

Naturally this is not great. It is the online equivalent of walking through a shady part of town - sometimes you have to go, but you don't want to be doing it regularly. I used to buy music on cdbaby, now I don't even do that. What I've heard is that if you are an American consumer, your loss is capped at 50,000 USD. I don't know if that is true, but I do know that if you are Indian consumer you would just be S.C.R.E.W.E.D.

Hello, Uber

Uber is a fantastic service. I followed the news about how they were scaring taxi unions around the world, and getting banned one city at a time, and I remember thinking "Go Girl"(I don't know why I made Uber a girl). I thought the tax unions' safety claims were horseshit, and while I knew that they had to do what they had to do to protect themselves, I also felt that they should be crushed mercilessly (which I don't think has happened yet? I can wait). I also generally believe that the State should regulate lightly and let businesses thrive, especially when large numbers of consumers could benefit.

So Uber came to India (yay!), and I guess they didn't want to subject their users to the extra step of credit card authorisation. Because when people reach their destination they don't want to type things into their phone. Because it's inconvenient.

Yes, it is inconvenient. Just like locking and unlocking your car is inconvenient. Or the door to your house. Or having to show ID when you access your locker in the bank.

We deal with, and even welcome these inconveniences because they protect us. This is about security, and not some abstract "national security", but concrete, personal, financial security. Every one of us should want that level of security.

Well, Uber decided that rather than subjecting their users to standard financial security practices, they were going to bypass them, by... routing the payment through a non-Indian processor. Genius! What an idea. Someone got a bonus for that one.


When I first read that this is what they did, I didn't know whether to laugh or to cry. But that's not all - then I read, and heard, many smart people saying that this was a good thing because it saved time and therefore contributed to GDP growth. And when the regulators told them to stop, that this was regulatory paternalism.

This is not paternalism. This is not cultural policing, this is not the Big Bad Government trying to be your father and telling you what is right and what is wrong. And the GDP of the country is not going to be touched by the time saved when people get out of their taxis. This is not of that, and dressing up a crappy argument with big words and numbers is like putting lipstick on a soowar.

The better approach

As responsible consumers, we want our financial transactions to be protected. But as lazy consumers, we still want the convenience of speed. Technology exists so that we can have it both ways - we shouldn't have to choose.

Rather than trying to be so smart and jugaad-ing the loopholes, Uber should have worked on a real solution. E-commerce is big in India, and I'm sure the large online retailers also want to save their customers the hassle of entering credit card information into their sites and apps every single time they buy something. A process needs to be invented, with the involvement of businesses, regulators and security experts.

California is not going to do this for us - it's our use-case, and we need to solve it.

Saturday, May 3, 2014

Programming with Callbacks

I wrote this as an answer on StackOverflow, and have decided to make a copy here as well. The question was "What is a callback function?"

Opaque Definition
A callback function is a function you provide to another piece of code, allowing it to be called by that code.

Contrived example

Why would you want to do this? Let's say there is a service you need to invoke. If the service returns immediately, you just:
  1. Call it
  2. Wait for the result
  3. Continue once the result comes in
For example, suppose the service were the factorial function. When you want the value of 5!, you would invoke factorial(5), and the following steps would occur:
  1. Your current execution location is saved (on the stack, but that's not important)
  2. Execution is handed over to factorial
  3. When factorial completes, it puts the result somewhere you can get to it
  4. Execution comes back to where it was in [1]
Now suppose factorial took a really long time, because you're giving it huge numbers and it needs to run on some supercomputing cluster somewhere. Let's say you expect it to take 5 minutes to return your result. You could:
  1. Keep your design and run your program at night when you're asleep, so that you're not staring at the screen half the time
  2. Design your program to do other things while factorial is doing its thing
If you choose the second option, then callbacks might work for you.

End-to-end design

In order to exploit a callback pattern, what you want is to be able to call factorial in the following way:
factorial(really_big_number, what_to_do_with_the_result)
The second parameter, what_to_do_with_the_result, is a function you send along to factorial, in the hope that factorial will call it on its result before returning.
Yes, this means that factorial needs to have been written to support callbacks.
Now suppose that you want to be able to pass a parameter to your callback. Now you can't, because you're not going to be calling it, factorial is. So factorial needs to be written to allow you to pass your parameters in, and it will just hand them over to your callback when it invokes it. It might look like this:
factorial (number, callback, params)
    result = number!   // i can make up operators in my pseudocode
    callback (result, params)
Now that factorial allows this pattern, your callback might look like this:
logIt (number, logger)
and your call to factorial would be
factorial(42, logIt, logger)
What if you want to return something from logIt? Well, you can't, because factorial isn't paying attention to it.
Well, why can't factorial just return what your callback returns?

Making it non-blocking

Since execution is meant to be handed over to the callback when factorial is finished, it really shouldn't return anything to its caller. And ideally, it would somehow launch its work in another thread / process / machine and return immediately so that you can continue, maybe something like this:
factorial(param_1, param_2, ...)
    new factorial_worker_task(param_1, param_2, ...);
This is now an "asynchronous call", meaning that when you call it, it returns immediately but hasn't really done its job yet. So you do need mechanisms to check on it, and to obtain its result when its finished, and your program has gotten more complex in the process.
And by the way, using this pattern the factorial_worker_task can launch your callback asynchronously and return immediately.

So what do you do?

The answer is to stay within the callback pattern. Whenever you want to write
a = f()
and f is to be called asynchronously, you will instead write
where g is passed as a callback.
This fundamentally changes the flow-topology of your program, and takes some getting used to.
Your programming language could help you a lot by giving you a way to create functions on-the-fly. In the code immediately above, the function g might be as small as print (2*a+1). If your language requires that you define this as a separate function, with an entirely unnecessary name and signature, then your life is going to get unpleasant if you use this pattern a lot.
If, on the other hand, you language allows you to create lambdas, then you are in much better shape. You will then end up writing something like
f( func(a) { print(2*a+1); })
which is so much nicer.

How to pass the callback

How would you pass the callback function to factorial? Well, you could do it in a number of ways.
  1. If the called function is running in the same process, you could pass a function pointer
  2. Or maybe you want to maintain a dictionary of fn name --> fn ptr in your program, in which case you could pass the name
  3. Maybe your language allows you to define the function in-place, possible as a lambda! Internally it is creating some kind of object and passing a pointer, but you don't have to worry about that.
  4. Perhaps the function you are calling is running on an entirely separate machine, and you are calling it using a network protocol like HTTP. You could expose your callback as an HTTP-callable function, and pass its URL.
You get the idea.

The recent rise of callbacks

In this web era we have entered, the services we invoke are often over the network. We often do not have any control over those services i.e. we didn't write them, we don't maintain them, we can't ensure they're up or how they're performing.
But we can't expect our programs to block while we're waiting for these services to respond. Being aware of this, the service providers often design APIs using the callback pattern.
JavaScript supports callbacks very nicely e.g. with lambdas and closures. And there is a lot of activity in the JavaScript world, both on the browser as well as on the server. There are even JavaScript platforms being developed for mobile.
As we move forward, more and more of us will be writing asynchronous code, for which this understanding will be essential.

Thursday, May 1, 2014

Writing a small application using Parse and Backbone - Part 3

The Story So Far

In the last post, I set up a Backbone view and a model, and connected the two. I also connected the view to a route so that we could see it render with the data. This time I want to figure out:
  1. How to query the Facebook Graph API
  2. How to persist data to the Parse backend

The Graph API

The first thing you will need to hit the Graph API is an access token. This token as an associated set of permissions which will determine which of your queries succeed. When you play with the Graph API Explorer, you will see a button saying "Get Access Token", which brings up a window asking you which permissions you are going to ask the Facebook user for:

When you do this through Parse, you will ask for permissions in the call to Parse.FacebookUtils.logIn. If the user grants you the permission you want, the access token will be in Parse.User.current()._serverData.authData.facebook.access_token.

To use this token to make a query, you call the FB.api function. Which works fine... except when the FB object doesn't exist yet. Or the Parse.User hasn't signed in. Or the access token has expired. All of which are possible, especially if the initialization flow is still in progress.

To handle this I wrote a little helper:

 facebookQuery = function(queryString, callback) {  
     var doer = function () {  
       if ((!!window.FB) && (!!Parse.applicationId) && (!!Parse.User.current())) {  
         window.FB.api(queryString, { 'access_token': Parse.User.current()._serverData.authData.facebook.access_token }, function(response) {  
           // token expired! this logic needs to be a)understood, and b)put everywhere  
           if(response.hasOwnProperty('error') && response.error.code === 190) {  
           else {  
       else {  
         // wait 0.1s  
         setTimeout(doer, 100);  

The 190 error code is what Facebook returns if the access token has expired. What I do is I log out and log back in, and this seems to work.

Persisting to Parse

They have actually made this pretty easy - the Parse.User object has a save method you can call. Here is some code which shows a Graph query for the logged-in user's profile information - you can see that I am asking for only the name, gender, and cover photo. Then I call the set method to assign to the fields, followed by save to send it to their servers.

 facebookQuery('/me?fields=name,cover,gender', function(response) {  
       // information to populate the model  
       var info = {  
         fbId :,  
         realname :,  
         gender : response.gender,  
         privacy_non_fb_see_only_name : Parse.User.current().get("privacy_non_fb_see_only_name"),  
         photoURL : response.cover && response.cover.source  
       // overwrite existing values, if any, and send to Parse  
       _.each(info, function(val, key) {  
         Parse.User.current().set(key, val);  

Next Time

In the next post I'm going to build the core functionality of this application, which is the creation of an invite. The code is on GitHub if you want to follow along!

Update (May 10 2014)

I'm not going to get to the next post for a while, some other tasks have come up at higher priority! I'll get to it when I get to it.