The Library model vs. the Web model for organising knowledge

In defining the user interface for my media server project at Synapse, I'm faced with a fundamental dichotomy, what I will call the Library model of organising knowledge vs. the Web model. To illustrate, a quote from Tim Berners-Lee, et al in Scientific American, May 17, 2001:
Traditional knowledge-representation systems typically have been centralized, requiring everyone to share exactly the same definition of common concepts such as "parent" or "vehicle." But central control is stifling, and increasing the size and scope of such a system rapidly becomes unmanageable.

Moreover, these systems usually carefully limit the questions that can be asked so that the computer can answer reliably -- or answer at all. The problem is reminiscent of Gödel's theorem from mathematics: any system that is complex enough to be useful also encompasses unanswerable questions, much like sophisticated versions of the basic paradox "This sentence is false." To avoid such problems, traditional knowledge-representation systems generally each had their own narrow and idiosyncratic set of rules for making inferences about their data. For example, a genealogy system, acting on a database of family trees, might include the rule "a wife of an uncle is an aunt." Even if the data could be transferred from one system to another, the rules, existing in a completely different form, usually could not.

Semantic Web researchers, in contrast, accept that paradoxes and unanswerable questions are a price that must be paid to achieve versatility. We make the language for the rules as expressive as needed to allow the Web to reason as widely as desired. This philosophy is similar to that of the conventional Web: early in the Web's development, detractors pointed out that it could never be a well-organized library; without a central database and tree structure, one would never be sure of finding everything. They were right. But the expressive power of the system made vast amounts of information available, and search engines (which would have seemed quite impractical a decade ago) now produce remarkably complete indices of a lot of the material out there. The challenge of the Semantic Web, therefore, is to provide a language that expresses both data and rules for reasoning about the data and that allows rules from any existing knowledge-representation system to be exported onto the Web.

In the media server, the Library/Web dichotomy applies to how the user interface is organised.

Library: Centralised. High barrier to entry. An image cannot be put in the media server unless it’s first richly annotated. No annotations, no entry. Whatever is in the media server is guaranteed to be very well described.

Web: Distributed. Low barrier to entry. First put it in the media server, then annotate it when convenient. Since the quality of annotations is suspect, a search engine that ranks on this basis is essential.

Further, there are two psychological factors that affect everyone who contributes to the media server: Discipline and Enthusiasm.

Discipline determines how seriously I take my annotation work. The Library model enforces discipline. If I don’t annotate it, it doesn’t get in. The Web model considers annotation optional, so my existing discipline determines whether I will annotate or not.

Enthusiasm determines the comfort/interest level in submitting to the media server. For example, I take photographs all the time, of varying levels of quality. I annotate them for my own reference. I’ve never considered submitting them to any kind of stock photograph archive or photography contest because, depending on my mood, I either don’t think them good enough, or I don’t see the point in making such a submission. But this lack of interest doesn’t mean my photographs aren’t good enough.

In the Library model, enthusiasm is important for an item to be submitted. In the Web model, enthusiasm is irrelevant. I annotate it for my own reference, but the very act of annotation makes it available to anyone who comes looking for it.

So which of these models should the media server use? I prefer the Web model for these reasons:

1. The Library model makes annotations the barrier to entry, but annotations don’t speak for the quality of the content itself. The barrier here is one of convenience rather than fairness. It assumes that if you think the image is high quality, you will annotate it well.

2. Management lessons from over the ages have taught us that it is far easier to enforce discipline than to create enthusiasm. The Library model requires high enthusiasm to be effective; the Web model doesn’t. This makes more sense when you consider enthusiasm as the opposite of inhibition.

3. The Web model gives users the freedom to organise as they like. Most people already have some rudimentary organisation in a folder hierarchy on their machines. This is lost in a centralised Library model but preserved in the Web model. The user’s organisation may sometimes define relationships between items that cannot be reproduced within the limited parameters of the Library model. For example, if I went to the beach last week and took a bunch of photographs, I will presumably put them all in the same folder. I may have been fascinated by something I saw and taken multiple photographs of it. These, I will put in a separate sub-folder. To another person now looking at my collection, the presence of the sub-folder clearly indicates that its contents are somehow related to each other in a way that doesn't apply to the photographs in the parent folder. The nature of this relationship is not explicitly specified but can be understood by examining the photographs (which a computer cannot do). In the Library model with its restricted system of organisation, this relationship may be entirely lost.
  • Avatar

    birdonthewire — Oct 3, 2003 4:21:48 AM — #

    Interesting. But what if you want to control the kind of stuff that gets put/maintained in the media server? It could be automated checks/some kind of manual feedback system (say for e.g. time since last usage, peer review ratings etc ).In this case wouldn't a purely web model face significant challenges from an implementation perspective ?

    :-) Let me give an armchair shot at it. Actually, I would think u possibly need a combination of library/web model. What I am thinking is a bunch of high level rules that define a library model, with maybe a "user workspace" as the leaf, and in that workspace, the web model could apply.
    • Avatar

      Kiran Jonnalagadda — Oct 3, 2003 7:58:48 AM — #

      Precisely what I'm doing. Everyone has to use the central server, but they get their own workspace where they are free to organise as they like. A search index trawls the user areas and builds a central catalog.

      Since this is a closed group of users, there is no risk of abuse, only of human error.
      • Avatar

        birdonthewire — Oct 3, 2003 8:35:00 AM — #

        Ah! Neat. Btw, you taken a look at MS Sharepoint? Something along the same lines I think, not necessarily only wrt media.

        Hmm...I suppose considering yours is a pure media server/search, then you could do more fancy stuff with the search/storage techniques. Hmm...goodie...Ah, now i see the point of your previous post and that semantic web idea... Interesting...
  • Avatar

    Anonymous — Oct 3, 2003 10:50:26 AM — #

    Good choice
    You did right by picking the web model. If for nothing else, I've seen the web model take off much faster. And for most content repositories, it's crucial to reach critical mass in order to be useful at all. You might want to consider using recommender/referral systems to make it easier to retrieve related content. Rashmi Sinha's got a very interesting paper on getting people to use recommender systems on her website: http://www.rashmisinha.com/articles/musicDIS.pdf .

    Kingsley
    • Avatar

      Kiran Jonnalagadda — Oct 3, 2003 11:34:21 AM — #

      Re: Good choice
      Thanks. Rashmi Sinha's paper looks interesting. However, I'm not sure how I can apply it to my project.

      The user profile for my project (it will be released as open source) is a small ad/creative agency (<50 people) with a high attrition rate that wants to maintain a private archive of its own work (Synapse has a very high attrition rate by design; people are expected to be in industry for 1-2 years before returning).

      Key concerns here:

      1. A new person at the company will usually turn to an old hand for reference on the company's previous work with the same client, with a specific industry, or with a similar creative brief. Since the target company has a high attrition rate, such a human may not exist. The media server should take over this role.

      2. When a person leaves, his personal folder gets abandoned, but it can't be removed because its contents are important. At this time, another person may take over maintenance of his folder, which defeats the personal-folder-for-everyone system, or may graft his hierarchy into their own folder, which breaks all links. I prefer the second because such "sweep up" operations ensure lesser litter over time; broken links can be mended by leaving redirectors at the older location, or using the search index to find the item again; and small, closed environments are unlikely to have a large number of internal links.

      On the Web, the second approach would be a disaster. The Web requires either long term commitment from members, or a system of URLs that is independent of how the user organises his files.
  • Avatar

    frozenaftermath — Oct 5, 2003 10:44:33 PM — #

    As someone who is at the end of all three aspects of the problem: design, implementation and usage, I can tell you the clincher is a fair mix of both approaches.

    I do not know about its current status, but when TNT was launched they had an image management system that was searchable via metadata in a DB stored as keywords. That way the end user was always shielded from where and how the images where served from since they were just IDs that would be called on the page. The only headache here is on the side of the guy who does the system, with the overwrite issues regarding existing inventory (rename or overwrite an already existing file). And it becomes even more useful once the inventory list scales up more and more. Grepping a 2G directory with over 2000 entries and no metadata is torture compared to the same thing indexed on a DB.

    The sucker of course is what level of lock down do you decide on as far as the entry point to the inventory goes. The mistake that most people who design content architectures make is that they assume a very strict level of possible human interpretations of context (like you have explained in the beach pix) and from experience I now know it is much easier to keep it reasonably strict than to assume it would happen voluntarily. The system that we use here now, has something similar that is used to generate item to item relationships on the controller level that is finally seen on the presentation level.

    The problem that we have run into is that, since the system is extremely flexible (we had one of our users even using the meta info "$username is great" on certain entries, and it would still work perfectly if we used it on other related entries) and even between two very similar people, categorisation differs vastly to the extent that we end up enforcing the strict entry guidelines that is quite similar to what would have happened from the library model. Since it is not a very prominent feature for us, we can afford to somewhat ignore and not lose too much sleep over it, but if it was a very important one, we would have ended up making a mess of metadata on almost three years worth of stuff (over 1, 00, 000 entries and almost 20 Gigs in a DB).

    I'd say go for the web model with strict rules as to what is indexed into the final inventory. Something like a public_html user directory system that gets created elsewhere than the actual home directory of the user. The spider can index from the directory and if you use trust levels to clear data that needs to be cleared you can also get a limited level of automation on the clearing process for users who know what they are doing.

Leave a Reply

You can respond with a photo by tagging it on Flickr with