Computer Devata

You go to an office, you want a report from them, you ask them for the report. You can’t get the report yet, you must wait. What could be the reasons?

Not-so-long-ago

  1. The person who creates the report has gone for tea.
  2. The file from which data needs to be copied is missing. Someone is trying to locate it.
  3. The person who needs to sign the report missed his bus and is still on the way to office.

And many more like these. But what’s the single reason that I get today when my work can’t be done by the concerned officials?

  1. The server is down.

That’s all. I couldn’t collect a medical report from Manipal hospital because their server was down and they couldn’t print the report. My Bangalore Airtel numbers are not getting provisioned because their server is down. Once I stood on Deccan Airways window for a ticket for 3 hours because their server was down.

The life is moving online very fast. However, we are not doing enough to make the online world reliable. The common practise is to just move all the operations and the data online and assume that it will always work.

But the reality is that it doesn’t always work! Wake up guys, build reliability and the speed into your system. Don’t assume they come for free. On the contrary, they cost a lot of money.

  1. #1 by hariom on May 13th, 2009 - 11:05 am

    Progress is an exchange of one problem with the another. This reliability you have been talking about stems also from the cost implications playing a vita role of developing economies.
    If the high availability requires static over provisioning of the resources leading to sub-optimal resource consumption, certain markets do not want them. So its a trade offs.
    Failure is ok as long as its not the final one and we continue to have alternatives to fall back upon. But these alternatives comes at a cost.

  2. #2 by Manas on May 13th, 2009 - 11:09 am

    The point is this - Either make your infrastructure fail-safe (i.e. make sure that you’ll never had an issue like server down) or make your processes fail-safe (there is always a way to do this if a server is down).

    Not doing either is bad.

  3. #3 by hariom on May 14th, 2009 - 2:02 pm

    I agree, but I am re-stressing the point of cost conscious markets. I mean in US or in develop economies, the static over provisioning of resources (hardware, network and process also), is something which is basic necessity. If you roll out a service (h/w or h/w), before even the feature those markets ask about high availability and reliability and then, even think of considering you.

    The places which made you wait, were the solutions (not engineered but desired), to be primarily low cost leading to over provisioning. Just an example mobile operator here over subscribes the capacity over the air around 20-50 times. They expect all the subscribers will not call at the same time. This overprovisioning is always there, but here the low costs are driving this fact or oversubscription to crazy limits.
    he dsl backhaul infrastructure is oversubscribed around 5 times. We make the router. To make one link more reliable, despite making better s/w and h/w, we provide n:1 redundancy where N can be 1-5. We know that customers in US will always configure average redundancy, and intentionally waste the capacity (whole line card and the link both).

    Take the example of datacenters, the ones which are required for any centralized information processing like banking, railways or aviation which made you wait. The factor of HA and reliability increases the cost significantly right from the enterprise side resources ( end point) to network connectivity to the server farms. When they say server is down, it could be cost cutting into anywhere in this chain. Take the cast of google or facebook, one query on the Internet is posted to around 50 servers in parallel leading typically upto 50 different locations. Thats why on google you rarely see the server is down. Yes in order to accomodate this, the s/w, h/w and system desgin also has to support this and its not purely a function of building more capacity, but cost (as seen by enterprises) is more impacted from more network elements and less from higher quality s/w running inside them.

    So the IT solutions and infrastructures that have made you have suffered from, party is because of bad design. But in majority cases when impact is this big as you are talking about - waiting for hours, its about making the cheap choices in deployments.

(will not be published)
Submit Comment
Subscribe to comments feed