Thursday, June 29, 2006

The semantic web can only be defined from the bottom up

Can web surfers collectively define the Semantic Web from the top down by social bookmarking, tagging, folksonomies, and outliners?

Let them become better writers and express their ideas more clearly, then... Once they can communicate what they actually mean, that is the time to create a Semantic Web. Create it by going backwards, extracting the semantics out of what they wrote and then indexing it. Ultimately the semantic web will defined from the bottom-up, linguistically.

In all the online discussion of Web 3.0 nowhere is the key word "linguistics"
to be found [Phil Wainwright (ZDNet), International Herald Tribute, A List Apart].

This is really strange since the two founders of Google where graduate students at Stanford Unversity's computer science department which specializes in Computational linguistics (aka natural language processing) and information retrieval(search engine design) promises to reduce largely to computational linguistics in the future.

The MUC (Machine Understanding) conferences of the 1990s had different academic teams competing to create software programs that extract information and summarize the meaning of different kinds of articles, for instance reports of terrorist acts(See example MUC-6 conference).

Fernando Pereira, an important Stanford Alumni and professor of Computer Science at the University of Pennsylvania is doing research on information extraction.

Stanford professor Christopher Manning's authoritative
Foundations of Statistical Natural Language Processing (1999)

There's a long way to go before before even the syntax of natural languages can be parsed correctly much less decoding semantics or pragmatics, but the HPSG project at Stanford has an impressive list of syntactic constructions that it can handle (but the program is written in the rather infrequently used (but fascinating) COMMON LISP).