>Long Tail of Content

>World Wide Web is vast. And overwhelming. It has so much of content. Content that is factual information, exciting ideas, deep thoughts and so many tools. Tools for using the content itself, for doing some fun stuff, for doing some serious stuff, and for doing nothing too.

In isolation, a piece of content is useful or useless. Interesting or boring. Relevant or irrelevant. Factual or baseless.

In isolation, a piece of content is either of interest to you or not. And if it is of interest to you, you should be able to reach it. Grab it and consume it.

Everyday I go to the World Wide Web looking for information that is of interest to me. I look for the new information that has been created now and may be of interest to me. I do this by surfing popular sites like CNET, InfoWorld or by subscribing to the RSS feed of popular blogs. I also go around looking for information that has existed for some time now and I have suddenly become interested in it. I do this by going to Google and keying in words that I feel represent objects of my interest.

More than half the content that I tend to reach is not relevant to me at the moment I reach it. And I don’t think I ever reach more than 1 percent of the content that is relevant to me. 1 percent is a randomly chosen number that signifies smallness of the relevant content that I reach Vs the largeness of the relevant content that is actually available.

When I do Google, it shows me that the results for my search are in 5 digits or 6 digits. I hardly ever go beyond page 2 and never beyond page 5. I look at only those pages that are popular.

I follow mostly the popular websites and subscribe to the RSS of mostly popular people.

Going with what is popular has a price. I mostly don’t reach what is relevant but not popular. Popularity is a measure of just popularity and not relevance. Popularity merely increases the *chances* of relevance but does not guarentee it. Because popularity of a piece of content, in general, depends on the source of the content, how many people agree with the content, how many people respect the source of the content, popularity of the supporters of the content, the presentation of the content and relevance of the content in the context of overall subject. All this excluding the delibrate attempt to increase content popularity by doing search engine optimization, link exchange, paid promotion etc. So, as you see, relevance of the content is just one of the many factors in determining its popularity.

But I reach the content through popularity index.

And there lies the long tail of content on World Wide Web.

Technorati Tags: , , , , , , , ,


>Tagging Blogs; What For??

>Technorati has launched a service for tagging blogs. And David Weinberger has covered it very well here. I agree with him on a couple of counts but mostly, I see his post overenthusiastic.

Some stuff that he has mentioned:

Technorati, a site that indexes 4.5 million weblogs, is now enabling us to sort blog posts by tag. This is way way cool.

This is exciting to me not only because it’s useful but because it marks a needed advance in how we get value from tags.

Just slap a tag on something and now its value becomes social, not individual.

First, I agree with David that this service is way cool. However, I have my doubts about the immediate usefulness of this service. And I wonder if even the Technorati guys have any definite idea about how these blog tags will be used and what for.

He is correct in stating that categories are not tags. Before we talk any further on this, I feel this statement should be reversed and made “tags are not categories” because we are talking about tags here and not categories.

So, tags are not categories. They are more like context in which the tagged information is relevant. And to be more precise, they are more like context in which the tagger (the person/machine that tagged them) considered tagged information to be relevant.

Tagging information has two primary uses (as we have discovered so far):

1. Getting to a piece of information. So, when you tag a piece of information, you are making a statement that the given information is relevant in these contexts. Once tagged, people looking for information that is relevant in a given context can use the context specific tags to get to it. They can be looking for it in real-time (e.g. through RSS) or they may be doing a search later on.

2. Locating people who share your interests. So, if two people are using the same tags to tag the information that they are producing/consuming, the chances are that they have common interest here.

But I believe that blogs cannot derive the same kind of value from tags as photographs and bookmarks. Reason? Because blogs have enough context within them that can be picked up by search engines.

So, let’s say you have a photograph of one of the tsunami hit areas. What will you tag it? ‘tsunami, devastation’. Let’s say you bookmark a URL of an article on the devastation caused by tsunami. What will you tag it? ‘tsunami, devastation’. Now, let’s say you wrote a blog on the same topic. Do you need to put tag on it?

As is evident, search engines can’t parse a photograph and tell you what it is about. Photographs represent information that is opaque for machines. Search engines can’t parse a bookmark (well, they can but they don’t) and tell you what it is about. Bookmarks are just pointers and you can’t make a machine know what a bookmark is all about. However, blogs are different. Blogs are information. Search engines can and do look inside the blogs and find out what they are all about. Here, tags are not much more than keywords. I can go to pubsub or technorati and watch specific keywords. And I am sure if the blog authors are going to tag their blogs, those tags will come out of the keywords only.

Hence, I feel that the usefulness of tags is not clear as of now and they are just what David has mentioned them to be: way way cool. I believe that foklsonomy today (just like everything else in the Tech world) is heading towards the peak of Inflated Expectations in Technology Hype Cycle.

Technorati Tags: , , , ,



Hey, I am tagging my blog as it’s cool and it may turn out to have some use eventually ;)


>Does your application do the right thing in the right way?

>You and me have applications that do the right things. But how do we know that those right things are being done in the right way?

Am I not speaking English? Let me be verbose then.

So, you have tested your application and you think it works. Which is a fair assumption to make considering that all your test cases have passed. Except that the assumption is fatally wrong. Did your testing also ensure that the right thing was done in the right way?

Let’s say you have a function ‘int multiply( int a, int b)’ which multiplies a and b and returns the result. If your test case does

if( multiply(2, 2) == 4) { pass(); }

Can you be sure if it is doing a multiplication or addition inside? It is doing the right thing by returning 4 but is it reaching the conclusion 4 in the right way? I agree, that it’s an overly simple and completely dumb example that I have chosen but it is the best I could find to illustrate the problem. Anyway, you can always modify your test case to say

if( multiply(2, 2) == 4 && multiply(3, 3) == 9 && multiply(4, 4) == 16)) { pass(); }

Now, you have significantly reduced your chances of misjudging the correctness of multiply() function. So, let me give you another example which is niether so simple nor so dumb. In fact, you will see that in the example I am going to cite now, all the inputs will always give the correct output without doing the right thing!!

Before the example, I need to build some background. The background is neccessary to introduce the application which will do the right thing in wrong way. I have recently developed an application called d-compiler. It’s a distributed build system based on peer-to-peer technology. This application is interesting, useful and different (from dmake, distcc and electric cloud). I’ll not get into the details of this app and just focus on what is relevant here: doing the right thing in wrong way.

d-compiler keeps a list of machines that are running d-compiler daemon ready to execute remote compilation jobs. So, when I do make, make will invoke d-compiler instead of gcc and d-compiler will send the compilation job to another machine. Now this is important: if no peer machine is available OR if the peer machine just hangs, OR if there is a protocol error during communication with the peer machine, d-compiler running locally will perform the compilation.

All was fine till recently when I discovered that the build had become a little slow. On further investigation I found that the d-compiler on my machine was not using all the peer machines for compilation. Reason: my latest feature addition had introduced a bug which would lead to protocol error between d-compiler machines. But I would never detect it in automated testing because the test case was to give local d-compiler a file to compile and get a compiled file back. Which will always happen because if the local d-compiler cannot get it compiled by a peer, it will compile by itself.

d-compiler was correct and incorrect at the same time. It was black-boxically correct as in it would take the input a .c file and give back a .o file. But it was white-boxically incorrect as it was not able to get these files compiled on the machines that were very much available.

So, whether my test case says:

if( correctly_compiles( file.c)) { pass(); }

Or it says:

if( correctly_compiles( file1.c) && correctly_compiles( file2.c) && correctly_compiles( file3.c)) { pass(); }

My testing is a failure because I’d never discover the BUG that keeps the functional behaviour intact but makes my application completely useless.

This incident has really started troubling me. I’ll never have enough confidence on this application as I’ll never be testing it extensively. And there *are* some real reasons for which I’ll never test it as extensively as is required to get *that* confidence.

To begin with, developers are not testers. I just don’t have the attitude to believe that there are bugs in my code till I observe otherwise. Which means, I have natural inclination for not testing the code that I have written. I know it’s wrong. It may even be sinful. But that’s how things are and I am fine with it as I consider it just human nature.

So, I don’t have a taste for testing. I don’t like this mechanical thing of creating a scenario, performing some action and then checking the log files to make sure that things indeed went the way I intended them to do.

Secondly, I have written this application for hobby. I have limited time to work on it. I developed it because it was fun to do so. I extend it because I find it useful and would like to take it forward. Unlike the initial time, the ratio between time-to-develop and time-to-test has become really small. Initially, I used to spend most of my time developing d-compiler and very little time to test and hence, the ratio was high. It was so because there were so many things to be developed and hardly anything to be tested. However, now it is different. Today, d-compiler *does* what it is built for. Any new feature/functionality/bug-fix requires very small code change. But it requires a plethora of functionality to be tested under horrendous number of scenarios. I don’t have the time to do it. And even if I had time, I would not do it. Because it is boring to do so much of white box testing. And I have many more things to do that are much more fun. The end result is either I extend d-compiler and give away the change without much confidence (not acceptable) OR I don’t extend it at all (not good).

The only way I see out of this situation is automation. However, automating the white/grey box testing is a dream today. And doing it for a distributed system is a distant one. I thought about it, googled it, asked it, discussed it but all in vein. People in past *have* shared this sentiment and have attempted to build a tool for automated testing of distributed systems. But none of them seems to be good/proven. Though I didn’t give it up, yet couldn’t make any progress. The trouble is not developing a tool that can do automated grey box testing. The trouble is that even the model that can be used for automated grey box testing is not clear. Can there be a generic model like JUnit that can be employed in the grey box testing of all sorts of systems OR one has to build custom tools specific to applications? Should the Testing System be an independent one closely observing the System Under Test OR the Testing System should be laid on top of and coded with the System Under Test? How does a testing system like this evolve with the evolution of System Under Test? There are just too many fundamental questions to be answered here and many more to be asked.

So, here is the problem. You and me have applications that do the right things. But how do we know that those right things are being done in the right way? And how do you create a testing environment that scales with your application?

I am thinking about it.


>Public wiki for product documentation?

>Jon Udell throws an idea (via Tim):

“The problem is that vendors, for the most part, do a lousy job of encouraging and organizing those discussions. Here’s an experiment I’d like to see someone try: Start a Wikipedia page for your product. Populate it with basic factual information, point users there, then step back and let the garden grow.”

To say the least, the idea is very appealing. In fact, when you read it, it sounds too obvious to be a big idea (or an idea worth talking about). Whether it really works or not is something that we can’t predict as of today. And I am sure someone out there is going to try it out as the idea is too obvious to be ignored.

However, I feel that Wiki can’t become a replacement of forums. Wiki by product users will be an un-organized and distributed effort which will have the answer to my specific problem only if someone cared to proactively put it there.

Let’s for a moment visualize the real life scenario. The product comes out, people start using it. There are P1, P2, …Pn problems with it. I can be facing any of these as soon as I start using the product. There is a strong possibility that not all the problems with their solutions will get documented at Wiki promptly. In that case, I’ll have to resort to a forum where I can directly raise a question, bring people’s attention to it and get an answer. It is also possible that I am facing a problem that none has faced so far. In that case, a forum is a very valuable tool as I can launch a discussion about various strategies for solving this problem.

In other words, Wiki is a dynamic method of capturing static information. Forums is a method of discussing and creating new information. So, Wiki can/will substitute the forum archives but not the forums themselves.


>The Effect Of Blogs

>Someone made a comment about my recent post on Ecosystem of Ideas:

“Wow! You really think so? For me blogs have meant that finally I can get the right kind of info (mostly fun stuff) from the right source without being hampered by the strings of traditional info delivery systems like newspapers, TV, books etc.”

What blogs _are_, what blogs _mean_ and what is their _effect_ are three different aspects of this phenonmenon that we call blogging.

Blogs are online journals. Journals of your ideas, feelings, thoughts,activities and whatever else you choose to put up there.

Blogs mean… Well, they mean very very different things to different people but let’s say they mean modern info devlivery systems, tremendous marketing force, democratizing force of Internet, so on so forth.

Effect is something very subtle. It’s more like an undercurrent that you can’t see without drilling deep down. It shows up very gradually, bit by bit. It’s subtle to the point of invisibility.

So, creating a new platform for supporting ‘Ecosystem of Ideas’ is an effect of blogging phenomenon.

I would really want to talk more about why and how of this effect. But that will be an essay and not a blog. And it would take a couple of days and not a couple of hours. Because the way effects are subtle, the causes are even more subtle. Catching the causes responsible for an effect is extremely difficult. And catching the correct ones is even more difficult. It takes time, and patience. Maybe some day…


>Ecosystem of Ideas

>Let’s say you have an idea. And why not? I have so many of them myself. So, you’ll also definitely have some. In fact, you might have many ideas. But let’s talk about one of them. Anyone.

How do you express it? Do you categorize the elements that form this idea? Or if it’s a complex one, you create a hierarchy out of it?

Do you think ideas are so flat that you can put its elements in flat categories? Do you think you can organize your ideas in a hierarchy?

Before we try to find an answer for these questions, we need to look at the anatomy of an idea.

Is an idea an island i.e. does it hold by itself. Can an idea be so complete by itself that expressing it does not require expressing, mentioning, referencing any other idea? No. When you look at an idea closely, you’ll see that it uses so many ideas as foundation, so many as pillars and so many as roof. It is strongly linked to some ideas and weakly linked to some other. A close look at the expression of an idea reveals that it is an expression of a primary idea that has been built on top of and with support of multiple secondary ideas. And when you look closely at a secondary idea, the secondary becomes primary.

In other words, the ideas are linked to each other in a web like fashion. They link to each other and are linked from each other. Our mind traverses from one idea to the other through these links.

In fact, more than forming a web, ideas form an ecosystem. Why I say ecosystem? Because ideas need each other’s support for their expression (which is as good as their existence). In fact, they are so dependent on each other that more often than not, an idea A cannot be even conceived without conceiving ideas B, C and D. Ideas are like living beings. An idea takes birth amid and out of other ideas, it evolves with some contemporary, some old and some new ideas. And eventually is superseded by newer ideas.

And I have really started liking blogosphere. It provides an extraordinary platform for ideas ecosystem. Blogosphere is not about expression. It is not even about publishing. It is a giant leap in the direction of getting ideas together so that they evolve and reach adulthood much faster.


>It’s 1.0: The Turnaround Time

>There is always a time in the life cycle of a software when it is under furious development. Everybody is adding code left, right and center. The mantra is ‘Got Code, Will Submit’. The mere existence of code is a qualification for submission. Does it work? Well, kind of. I just tried doing foo in as rosy a scenario as possible, and it happened. Ha, that’s how the 1.0 is developed.

This time is extremely short. And there are reasons for that. Number one is that the time available for releasing 1.0 itself is short. And reason number two is that once the product goes to a customer, this time never comes again.

Let’s call it pre-1.0 time.

And don’t argue that you have worked on a project where this time lasted till 3.0 OR it was as shortlived as 0.5. 1.0 is symbolic here.

Then there is a time when the mantra changes from ‘Got Code, Will Submit’ to ‘Got requirements, will write specs, will get them reviewed, will write a design document, will get it reviewed, will write code, will comment the code, will get it reviewed, will update the implementation document, will get it reviewed, will write unit tests, will get it reviewed, will execute the unit test, will fix the problems, will ask permissions to submit, will provide clarifications why something was done in some way, will hopefully, eventually submit code’.

Again, your mantra may be a little different but don’t argue. As this is also symbolic.

Let’s call it post-1.0 time.

Reason for such a drastic change? The focus has changed!

To begin with, you want your software to be developed. Once it is developed, you want it to work. Once it starts working, you want it to keep working, and you want to keep a proof for later that from your side you made every possible attempt that it continues to work, and you want to ensure that anyone from the development team can leave and you can hire the programmer next door to replace the loss, so on and on…

See the difference?

What I find the most interesting is that this trunaround happens in an hour or a day. And even if it’s a week, it’s short.


Follow

Get every new post delivered to your Inbox.