Thursday, April 07, 2005

distributed systems

The work I do is focused on distributed systems. More specifically, on keeping very large, extremely complex distributed systems healthy. This includes collecting metrics, knowing what values of each metric implies a (potential) problem, and figuring out how to respond. In part, one tries to eliminate as many outages as possible. Once you've done your bit there, you try to reduce the time to recovery as much as possible both by building systems that can heal, and by giving humans as much useful information as possible so that when they get woken up at 2am, they waste as little time as possible. And you have to do all of this while not actually screwing up the system that you're trying to manage.

I'm pretty passionate about this area; even though I often complain about being burnt out at work.

I used to maintain a blog on the company intranet pointing to the papers I was reading, as well as some thoughts about how those ideas may affect our work. We solve some pretty huge problems and run an extremely complex distributed system; the blog was internal because I found it difficult to write something meaningful without mentioning internal projects/tools/concepts/ideas.

I recently read one person's attempt at drafting some guidelines on corporate blogging and now I'm tempted to spin off a blog devoted only to distributed systems; obviously with content fit for public consumption.

I'm still mulling over the idea; I'm not sure how meaningful my content can be while not revealing company jewels. Especially since I'm not some famous industry guru or academic.

1 comment:

Anonymous said...

heay cool that u r contemplating about starting a blog on distributed systems. It would be of great interest to quite a few of us out here. I personally have been trying to find one such that would give me information on distributed systems. Im passionate about the sheer power and scalability of such systems though I know nothing more than that. Would be great if you would start this blog.