You have probably read the
news about Google accusing Microsoft of stealing their search results. The reaction of Microsoft moved in the whole range from
not denying to
accusing Google of click fraud. Undoubtedly the whole Internet population eagerly awaits my verdict on this case.
We should first clear out what are the exact accusations. Google claims that certain features in Microsoft products track what users click on Google. The features in question are IE's Suggested Sites and the Bing Bar which is installed separately from IE and Windows but comes preinstalled with many new computers and is an option in the installation of many products from Microsoft. Google does not claim that Microsoft copies the search results directly from the Google service. In fact their experiment confirms that Microsoft does not. The only claim is that Microsoft is looking at what users of the accused Microsoft products are clicking after a search on Google.
What goes into making a good search engine? It turns out one needs a good ranking algorithm and a lot of data for this algorithm to consume. The data is even more important than the algorithm. The data links words to URLs. For example a word in a link is connected to the URL of that link by the fact that someone has decided that this word describes the URL. Search engines have a lot of ways to link words to URLs. In its first post Microsoft says that Bing uses over a thousand ways (called signals) to do this. Different signals have different weight and the algorithm uses them in different ways. One very important signal happens to be on what result users click after a search. Most search engines including Google track that, usually via a JavaScript on the page. This signal is really important because a human has determined the relevancy of the result in the semantic context of a web search. This is arguably as good signal as you can get.
Why is Microsoft stealing this signal? As it turns out in order to generate a lot of this particular data you need a lot of users making a lot of searches. In order to get a lot of users to make a lot of searches you need a good search engine. In order to build a good search engine you need a lot of data. It is a catch 22. Google are effectively monopolizing this data. Even if someone has a better algorithm Google will still be better because there is no way for anyone to get the data to feed the algorithm. Some of the more leftish readers may argue that Google should be obliged to share the data by antitrust body. I do not buy the antitrust bullshit and I believe that Google worked hard to get this data and they are in their full right to exploit it as they see fit.
I should point out right now that my personal belief is that Microsoft did plug into this signal and what Google claim is true for Bing Bar but not for Suggested Sites. Having two features report the same thing does not make sense and I believe the IE team is independent enough from the Bing team. What is more Microsoft will surely want to keep IE clean. What Microsoft is (probably) doing is totally legal too. If it was not Google would be suing right now. I also think that the Bing team idea is actually ingenious. They cannot gather enough user generated data from their own service but they have found a way to gather it when it is sent to Google thus breaking the circle.
Google's position is that Microsoft is stealing their results but this is not true. What Microsoft is stealing is the user generated signals. Google do not own their users and the data they generate. What is more these users are not only Google's users but also Bing Bar's users. You may say that the clicks Microsoft harvests are on the Google's web site. So what? The information comes from the user. He makes the semantic link between the word and the search result. Google's claim of theft is like me claiming that Google are stealing from me because they are reading the semantic data I have put into the links on my website. It is even worse because I actually created the data on my website myself and they did not create the data Microsoft collects.
On the technical side harvesting the signal makes more sense than querying the Google service directly. Displaying Google's result will improve the search quality for the stolen words but it will not improve Bing's technology because the semantics that go into building the rankings are lost. However if Bing tracks the user clicks it preserves the semantics because it knows the origin of the data and this makes it much more useful.
The situation is really strange. Google cannot sue Microsoft and cannot prevent Microsoft from harvesting more user clicks by technical means. The only thing they can do is go for a P.R. attack and so they did. They announced their theory the day before a search related event sponsored by Bing and tilted the whole discussion towards this problem. Now the real question is how Microsoft will be able to control the damage. Usually Microsoft is pretty bad at P.R. and Google are kings so maybe Microsoft will regret doing what they did (if they did it). I personally liked the initial reaction a lot. Ars Technica summarized it as saying "
So what?". And this is exactly what their position should be. They are not doing anything wrong and should not be ashamed of it.