Monday, May 16, 2005

tagging

I love flickr and delicious - both sites that let you organize 'stuff' (photos, links respectively) using tags (arbitrary space separated words that you can assign to things).

Clay Shirky recently posted an excellent essay entitled Ontology is Overrated in which he talks about a bunch of cool stuff and then goes on to say why tags are so damn useful.

But here's an excerpt from early in the document (where he's talking about categorization) that caught my eye:

    Ontological classification works well in some places, of course. You need a card catalog if you are managing a physical library. You need a hierarchy to manage a file system. So what you want to know, when thinking about how to organize anything, is whether that kind of classification is a good strategy. (emphasis mine).


I'd like to question that asusmption actually... Do you really need a hierarchy to manage a file system? I've spent the last few days going through my OS textbook and doing a bunch of reading/searching about file system design. I think that's an assumption that's ripe to be questioned.

I know that some people have already tried building in 'tag-like' notions into a filesystem. In fact, the defacto filesystem in OSX, HFS+, now (as of Tiger) has support to add arbitrary key/value attributes to files. I haven't downloaded Tiger but, from what I remember from reading reviews, this feature is currently used in only a few places like for ACLs and maybe some Spotlight metadata.

Getting back to the point; why does a file system need heirarchy in order to be manageable? One survey/research paper I read (will post link when I find it again) essentially says "there's too much software out there that assumes that the file sysem is heirarchical so I'm not going to even talk about building something that doesn't have any heirarchy." That may actually be the correct, practical viewpoint to take. But real innovation comes from questioning the 'practical viewpoints' of our day, right?

If a URL is an inode and the title of an html page is a filename, then your filesystem and flickr are not too different. That said, URLs are not as opaque as inodes. If I see a URL with mozilla.org in it, that gives me some clue about the contents even though the exact semantics I associate with it may be varied.

If you know of some work in this area, please enlighten me. In the mean time, I'll be sure to use my not-so-copious amounts of free time to try to read more on the subject.

update: Thanks, Huat, for the pointer. As always, I feel like an idiot for being so clueless about what's out there. :)
update: Interesting; with WinFS, msft is trying to do with the PC, something similar to what the semantic web is trying to do to the internet - give well defined structure and semantics to data. It's hard to get it to work on the web because of how many diverse applications there are, and how loosely structured the data (HTML) fundamentally is. On the PC, though, msft-written software probably makes up a majority of the software you run (not me; I have a mac like any self-respecting yuppie). More importantly, much of the content you create on your PC is created using msft applications. Two questions come up in my mind: (1) how easy is to work with content not created using msft applications; (2) how useful/intuitive is strong typing (vs. tagging) to the end user.
more: I came across this discussion of Longhorn which includes a mini rant at the end about heirarchical file systems.

1 comment:

Huat Chye Lim said...

Hemant: WinFS (the new Windows filesystem) is built on top of, I believe, a relational database, and supports organizing files hierarchically as well as in ways more suited to a database, like property tags. Overview at http://msdn.microsoft.com/data/winfs/default.aspx?pull=/library/en-us/dnintlong/html/longhornch04.asp.