Saturday, October 4, 2003
Metacrap: Why the Semantic Web will not happen
Cory Doctorow in 2001 wrote an excellent rebuttal to the Semantic Web dream, explaining why human nature (people lie, people are lazy, people are stupid) will ensure that reliable metadata, the basis of the Semantic Web, will not be created.
Cory ends his argument by pointing out that while explicit metadata will never be trustworthy, implicit metadata, such as what Google uses for PageRank, already is. It's no wonder then that Google ignores explicitly defined metadata in HTML pages (except for things like language settings that affect how a human sees the page).
Cory ends his argument by pointing out that while explicit metadata will never be trustworthy, implicit metadata, such as what Google uses for PageRank, already is. It's no wonder then that Google ignores explicitly defined metadata in HTML pages (except for things like language settings that affect how a human sees the page).
ravi — Oct 4, 2003 2:58:24 PM — # ↩
Thanks for the link!
frozenaftermath — Oct 4, 2003 6:03:24 PM — # ↩
Recently, Mark Pilgrim had commented: "The Semantic Web is an unattainable pipe dream, or is too fluidly defined to ever come about, or something" and something similar here.
I guess it is catching on.
Kiran Jonnalagadda — Oct 5, 2003 2:45:16 PM — # ↩
mindlace — Oct 8, 2003 3:19:34 PM — # ↩
While all of the things he says are true, there are many agents that are highly motivated to get metadata done correctly. One of his examples of how it "won't work" is an example of precisely how it does work; those that do their metadata correctly on ebay get paid more than those who don't. There are plenty of good metadata housekeepers out there: Amazon keeps all their books properly marked up, there's lots of individuals that are quite fussy about the metadata on their tracks, etc.
That data that is insufficiently marked up will be leveraged less than the properly participating data; this will create gradual selective pressure towards a meta-data rich web. It doesn't matter if 95 agents haven't encoded their information properly as long as 5 have.
Finally, semantic encoding can dramatically increase your ability to do the sort of implicit metadata analysis that Doctorow claims does have value, as the example of link-encoding shows.
I've realized I've flopped between the phrase metadata and semantic encoding, as I understand the semantic web to be way more about semantic encoding - using dtds and whatnot that are appropriate to the information you're creating/using - rather than about 'metadata', like additional tags and what have you.
This is a subtle difference, because arguably much semantic encoding is 'metadata' from the experience of a human's perspective, but I think that it makes the story much clearer. Metadata may still suck, but there will be markup, and from this tag soup sufficiently smart agents can figure something out- If you do have a small population of good taggers floating about in a sea of lousy-taggers, from the set of documents you should be able to make inferences about even lousy-tagged documents.