nutch

I found this new search engine project called Nutch via DaveNet:

Nutch is a nascent effort to implement an open-source web search engine. (It) provides a transparent alternative to commercial web search engines. Only open source search results can be fully trusted to be without bias. (Or at least their bias is public.)

IMHO, this is like finding an RSA encryption algorithm for information retrieval. While many encryption systems base themselves upon the premise that their inner algorithms are kept secret; RSA, or any other publicly described methods are strong because the actual algorithm is tough to crack.

Extending this logic to search engines, it is easy to see that the countless "search engine ranking" companies exploit every known attribute of the secret ranking algorithms in their favour. Thus, whoever knows how Altavista / Teoma / Google / Yahoo ranks pages, can modify his / her website to maximize rankings. This is why all search engines do not provide full details of their algorithms.

Having a totally public algorithm would mean that there's no way you can cheat the engine - the only way to get better rankings, is to provide better content - search engine utopia. I'm really looking forward to seeing what they come up with.