2009-07-21

Many things in life can best be represented as trees. Indeed, life itself is a tree of various life forms. We have the animal and plant kingdoms, and these main categories are further sub-divided into phylae, classes, genera. The animal kingdom is divided into invertebrates and vertebrates. The latter include amphibians, reptiles, birds and mammals. Human knowledge is also depicted as a tree.

A lot of other things are trees. Hence, philosophers, mathematicins and computer scientists have put quite an effort into the modelling stuff as trees. Within library & information science we have used the term classification systems for centuries, I suppose, and more recently we have started to talk about ontologies and knowledge organization. There are a lot of theory and a plethora of standards in this area. Systems like topic maps (see also and more recently Simple Knowledge Organization System SKOS) are available for those who need a standard for knowledge organization.

On a very down to earth level, I tend to think of controlled vocabularies as filters and navigators. The latter are trees and may contain many terms arranged into a tree of arbitrary depth. The filters are lists of controlled terms, which may be whereabout in the tree we are. Think of Yahoo Directory. You can browse into Tuscany and get a very long list of resources. Then there is a menu at the top where you can choose between "Business and Shopping", "Entertainment and Arts", "Recreation and Sports", "Travel and Transportation" and so forth. These are broader categorizations that apply to Firenze as well as Siena or New York city or any place you may want to visit. We in the Web business call them filters or facets.

Recognizing and recalling

From a user perspective, you have to take into account that there are two cognitive processes involved when using services which like Yahoo Directory employ navigators and filters. The first process is more fundamental, and is by nurture and possibly nature a part of the human mind: This is to say: Ah Tuscany, that is a part of Italy, isn't it. It's like distinguishing between a wasp and a bumble bee. On an evolutionary time-scale we've benefitted from being able to say, this is edible this is not, or this one has a sting and is likely to use it. This is to recognize something when you see it.

The opposite is a much more difficult cognitive process: I want to know something about Tuscany: Ok, I then start by clicking on Italy. You have to connect Italy with Tuscany, not only recognize the word Tuscany when you see it. This makes it more difficult to find Tuscany given Italy than Italy given Tuscany (in Yahoo it's in the bread crumb path). Remembering might not be trivial. To know in advance things like "frogs are amphibians" and that "diabetes is an endocrinological disorder". On the other hand, to perform successful searching in Google is even more demanding, in spite of google suggest and spell checking tools that have improved the search comfort for many users.

Ordered Hierarchies of Content Objects

There are structures that are inside a resource, and that require navigation as well. As a first approximation we can assume that all digital resources, such as electronic books, are trees as well. In the theory of text this model is called OHCO (ordered hierarchy of content objects). That view has been questioned, and it is for instance claimed that any ordinary text consists of many, overlapping hierarchies. I just love overlapping hierarchies. They are great fun.

Anyway, if you search the scientific literature on the theory of text, you'll find everything you want to know about post-structuralistic deconstruction of the OHCO model. Theoretically, I can understand the problems. But for me they are purely academical.

If you instead search Google books for some real content, nightingale lark balcony you'll find Romeo & Juliet in her bed room. Scene V in Act III of Romeo and Juliet, on page 189, 160 and a lot of other page numbers, dependent on which edition. In the world according to me, you may have a lot of hierarchies and they may overlap, and OHCO may be false. But if you want to be able to address content in a more clever way than the one chosen by Google books, then you cannot do without a content model that looks like, well, very much like a tree.

