Tuesday, June 27, 2006

Research folksonomies as library card catalogue extensions

The Penn Tags project is going to add social bookmarking and folksonomies to university library card catalogues. From Corante.com

As a rather intensive user of university library card catalogues, it seems like this development is a little premature. Since it is a university, why not open source it? Hold a competition and see which budding programmer / software designer comes up with the most useful tool.

It's the individual paragraphs of books that should be tagged. Intellectual property laws still prevent the direct citation, linking, and access to sources that you really need to actually index the information in books.

The current scheme of licensing journals doesn't help either. At U.C. Berkeley only users with student or staff ids can get access to online journals, although a lot of other people make use of the U.C. Berkeley libraries such as journalists or specialists in certain areas. You can only assume a small fraction of the public is going to have access to expensive academic journals.

The right scale for such a project is the whole web not an individual university library. Google Book Search is the right place to begin. (See universal library)

Also tags are just keywords that summarize and identify content. Tag creation should be driven by search. This is something I've realized everyday in my job as I scan articles for vocabulary to define and more importantly terminology to define. Unique and often low frequency terminology (concrete nouns) are the best subject/topic identifiers, the natural tags for newspaper articles which are a lot shorter than books.

Vocabulary profilers will give you a color coded frequency ranked vocabulary list for any article that helps identify the right tags. Just look at the low-frequency red words.

One nice difference between Penn Taggs and tags at places like Flickr that I see is the underscore to bridge words "shih_tzu" is used at Penn Tags not the "shih tzu" I always see at Flickr. Languages that don't use spaces (Thai, Hindi, Chinese, Arabic,....) are at advantage over English here because they are forced to use some objective statistical measure of co-occurence to define what exactly a word is. In English the convenient space just encourages people to be lazy in their parsing.

Book reviews for a book are another thing you want to link in to the card catalog. Usually you can find a few high quality ones generally available online. You could almost define a good book review as one that gives you the right tags for the article. Book reviews for books of the past are a useful construct.