23 April 2011

The Google Video scare shows the danger of trusting one online archivist too much.

I've often written about good services being shut down on this blog and, when I heard Google's announcement concerning the imminent closing of Google Video, I felt sure I had another sad tale to write about. However, I'm happy to say that, for the moment, cooler heads at Google have prevailed and their pre-YouTube acquisition video service will kept open indefinitely as the content is migrated to YouTube. I have to give credit to Google for taking its users' content more seriously than many other Internet companies. For instance, I can still access my notebooks on Google Notebook even though the service hasn't been accepting new users in a long time. It's good to know the company hasn't totally lost touch with its roots, but it's still disturbing that the initial decision was reached to begin with.

Whether it is comfortable with it or not, Google has evolved into one of the foremost archives of the Internet. It hosts millions of blogs on Blogger, many of which are long "dead." It stores the wisdom of the ages on Google Books. It has scores of old newspapers available for searching and viewing at the Google News Archive Search. Of course, it also is the major online video archive too since it owns both YouTube and Google Video. It is disturbing to think that some bureaucrat or accountant could decide a service is no longer worth keeping and with the stroke of a pen or the firing of an email lead to content created by thousands or even millions of people being destroyed. It's not like this kind of thing hasn't happened before -- look at how Yahoo! gleefully junked GeoCities and its 360 blogging service. The trust that so many of us place in big Internet companies to safeguard our content is probably misplaced. Yet we also have a huge need for archives online, all the more so since the amount of digital content being created daily is mindbogglingly enormous. I wonder how many people who posted their work to Google Video when it was still accepting uploads are now dead. Had Google not reversed their decision, much of the content those people created would probably have been deleted for good because they were no longer in a position to protect their own work.

It's clear we can't trust the beneficence of the Internet giants to keep our digital history alive. We need as many archives as we can get. So don't grow too dependent on the Big G or any one archiving entity. Keep local copies of all your own work. Consider uploading your stuff to multiple hosts. And above all else support serious archiving projects like the Internet Archive and Project Gutenberg. As a consumer, it's easy to grow accustomed to using one archive for one's viewing needs. That's OK -- we all have preferences. However, we have to accept that our favored archive may not be around tomorrow so it only makes sense to prepare ourselves for that possibility and do what we can to support the alternative options. In the long run, it's best for many different projects to shoulder the archiving load. That will mean that the loss of one partner in the struggle -- such as Live Search Books -- will not do as much damage.

01 March 2011

Google shouldn't copy Blekko.

Ranking the best content for a given search query has always been a difficult task. I have no quarrel with those who note that search engine optimization techniques have allowed inferior content to overshadow the good stuff to a certain extent. It's definitely not easy to run a search engine -- part of the job is staying one step ahead of all those people who would like to manipulate search results for their own ends and they are legion. However, I don't consider ignoring wide portions of the Web to be part of the job...if anything, it's an abandonment of a search engine's fundamental duty. If a search engine no longer indexes the accessible Web, it is partially blind. It doesn't itself really know what is out there and so it can't possibly be trusted to direct its users to the best content.

Thus, when the search engine Blekko opted to ban a slew of sites accused of being spam by its users, I was frankly appalled, and my consternation only grew as I read through the list of the banned sites. Freewebs (rebranded Webs now) was one of the victims...it is a free web space provider, for goodness sakes! Just as they did on GeoCities back in the day, people use Freewebs/Webs today to gain experience building and maintaining web sites for free. Kids, Internet novices, and cheapskates, listen up: Blekko doesn't think you deserve a chance to be seen. Somehow, an online dictionary and a petition site made the list, too. That many of the banned sites do host rather poor content is undeniably true -- there is a reason so many Blekko users branded content on these sites "spam." However, many of these same sites host good and useful content as well. Rather than seeking to rank individual pages on their own merits, Blekko decided to throw the baby out with the bathwater. Should this idea catch on, it will place a target on the back of every site that dares to allow its users to contribute content...every article archive, every free web host, and every blogging host is at risk because these sites by design cannot guarantee an across the board consistency to their content. Blekko at the moment is a rather insignificant player in the search world, but I know a dangerous idea when I see it, especially a dangerous idea that can be linked to a noble idea like fighting spam and worthless content. Search engines at their best encourage free expression because they allow every writer a spot in the index...perhaps any particular individual's voice is hard to hear amidst the din of the crowd, but heard it can be if only that right, magical set of keywords is entered into a search engine. That's why I love writing on the Web: no matter how obscure a blogger I may be, I'm still just a few words in a search box away from being read. At least until Blekko takes over, that is.

Google has recently responded to the demands of its users for better search results with a significant algorithmic change. When Google talks about reducing "rankings for low-quality sites," it's difficult not to see the influence of Blekko at work. For now, though, Google seems to be trying to do things the right way -- it isn't banning low-quality sites but rather just trying to rank them more appropriately. However, even this mission isn't quite right...Google should be able to find the good content hosted on ANY site. Branding a particular site "low quality" may be convenient, but if the high quality content hosted on a low quality site appears below the low quality content hosted by a high quality site search engine results will still be bad. Certainly some content does indeed deserve to be sent to the Void -- sites that intentionally host malware, for instance -- but "low quality" (ultimately a rather subjective valuation) sites may still be useful and certainly do not deserve invisibility. Hopefully Google will not forget that its users count on it to keep track of the entire Web, even those neighborhoods some consider to be on the wrong side of the tracks.