Sunday, June 12, 2005

lightpost

posting will be light for the next few weeks.
In a few hours I fly to Delhi.
Tomorrow morning I fly to Leh, Ladakh for a 5 day trek in the Indian Himalayas.
After that I head back to Delhi and on to Pune for about 10 days during which time I will relinquish my bachelorhood.
I may be online and checking mail from Pune but probably not posting. Will be back to the grind sometime in July.

prediction

I wrote parts of amazon's remote-service-invokation framework a few years ago, after which a few of us worked on collecting detailed metrics from the framework so as to guage the health of the system and try to predict business-impacting problems before they actually, well, impacted business.

I'm currently reading blink by Malcolm Gladwell and it's a facinating book. No time for a full review right now. But here's a quote that caught my eye (he's talking about an algorithm for deciding the seve rity of heart-attack-like symptoms in ER patients, and has just listed several high-heart-attack-risk lifestyle factors):

    ... It certainly seems like he ought to be admitted to the coronary care unit right away. But the algorithm says he shouldn't be.
    <snip>
    What Goldman's algorithm indicates, though, is that the role of those other factors is so small in determining what is happening to the man right now that an accurate diagnosis can be made without them. In fact <snip> that information is more than useless. It's harmful. It confuses the issues. What screws up doctors when they are trying to predict heart attacks is that they take too much information into account.


(The book, by the way, attempts to explain intuition and how it is that we can get such strong (and often correct) intuitions without being able to understand exactly why. It also attempts to analyse the cases in which our intuition is terribly wrong. See also this entry by Trevor for more about intuition.)

This is cool because our hunch over the past few years has been that it will only take a few metrics to actually predict a given failure scenario, but deciding which ones to pick is the hard thing. So the kinds of systems we are trying to build end up being quite similar to what (I just found out) humans are doing. We're constantly taking hundreds or thousands of input variables (subtle changes in a persons face or 'body language', things seen in the periphery of our vision, etc.) and doing some realtime statistical analysis on them. Except our consciousness is never burdened with any of that. Our subconscious builds and refines these elaborate statistical models over time. Then, it can bubble up signals (in the form of intuition) to our conscious mind with very limited information because it has already made models about which variables are important enough to matter.

How does this apply to metrics and monitoring? It's infeasible and foolhardy to track the state of every possible instrumentable variable in your system in realtime and use that to drive failure detection and root cause analysis. But
  1. you may be able to design a system that can collect lots of metrics and analyse them in an 'offline' manner without impacting your system.
  2. the output of (1), a list of 'important' metrics, is fed into an alarming/monitoring system
  3. whenever an alarm is diagnosed (or confirmed) the result of that is fed back into (1) to correct or reinforce the prediction.


If failure detection is like the pit in your stomach or lump in your throat, and root cause analysis is like the logical reasoning that we sometimes go through when making decisions, then maybe we have to accept that failure detection is a much faster process than root cause analysis. Our group has always looked at those as two different processes, but never acknowledged that they may require different amounts of information.

On one level, that looks hopeless; "what good is it to know that something is wrong if you don't know what it is?" But we do that all the time. A lot of us learn to trust our instincts (don't walk down that alley) even if we can't tell exactly what's wrong (it's well lit, there are people around, but it just feels shady).

How could that help in managing distributed systems? The only example I can think of right now is: if a host 'feels like it's unhealthy' it could just take itself out of a load balancer without knowing what was wrong.

It does tell me is that it may be worth completely separating the process of detection and root-cause analysis. So that the feedback in (3) above is not "the root cause of this disturbance was xyz", nor is the list of 'important' metrics in (2) a list for each possibel root cause. (i.e. you don't output something that says that metrics A and B are important for predicting a disk crash, but metrics D and F predict a web server failure and metrics C and E predict that your application is deadlocked). That's how antivirus software is modeled. It builds up fingerprints of different viruses and tries to match the fingerprint. It does both detection and root-cause-analysis in a single step. (OK, maybe modern antivirus software does more than that, but stick with me for a moment).

Instead, maybe the right but counterintuitive (no pun intended) thing to do here is to only store whether or not "Bad" things happened, and store the set of metrics which are good predictors of "Bad"ness. You'd probably need more than a binary notion of Badness. This doesn't get us closer to solving problems, but maybe it can help reduce downtime in the first place, because we've got a very good early-warning system.

It'll be interesting to see if any more insights come out of watching a large system running (the group I've been working with in Bangalore is getting closer to releasing an internal, scaled-down version of what will eventually be a large self-healing distributed system). Since I've been in development mode for the past few months (vs. supporting a live system), I feel a little unqualified to rant too much about this stuff. :)

Saturday, June 11, 2005

d.s.

Someone left comments (twice!) asking me to follow up on my earlier idea of starting a distributed systems blog. Thanks! But I realized that, although I do a lot of work in the area, I don't know enough to do justice to a blog - I still have many miles to walk.

So every time I run across something interesting, I'll be sure to post on it. But until then, I'll be sticking to reading and learning and growing.

Friday, June 10, 2005

FC reading.

catching up again on some FC reading.
About Despair, Inc.
    The point is that most people should work to make money. They shouldn't expect a company to make them happy. A company can be friendly and good, but it can't really make you happy. At the same time, it shouldn't insult you. It shouldn't say, 'We're a family and have values,' and then act like Enron."
    <snip>
    Jamie Malanowski is the features editor at Playboy. He's happy in his work.

I love the last line about the author of the article! :)

An awesome post about being a bouncer:
    The bouncer ethos, in point of fact, stands in diametric opposition to that of any other position in the service industry. Simply put, if you, as a bouncer, stand there and take crap from the customers, you won't be employed for very long, because everyone on the staff will consider you a pussy, and they won't want you around. Therefore, when people -- as they invariably will -- act like assholes, I'm getting paid to fulfill the one, singular fantasy harbored by everyone who has ever served a drink or waited on a table:

    I can do it back.


On cubicles sucking:
    The solution, Tompkin says, is to customize space to various types of work. Give those who need uninterrupted time a quiet place to work and those who need to collaborate a more social space. That may mean a glass-walled office for heads-down work, and a variety of gathering places for group work. "As the workforce becomes more mobile," Tompkin says, "the office will be the main tool companies use to build a shared culture."

I totally agree. I wish we'd have a healthy mix between offices and cubes. Or even moderately sized rooms with cubes in them, instead of a huge room full of cubes. That way I can only potentially get distracted by 4-6 other people instead of 70.

Sounds like GE Durham has taken a page out of Gore's (makers of GoreTex) book (Reference to gore from Malcom Gladwell's tipping point. Lots of great stuff here:
    GE/Durham has more than 170 employees but just one boss: the plant manager. Everyone in the place reports to her. Which means that on a day-to-day basis, the people who work here have no boss. They essentially run themselves.
    <snip>
    So how can something so complicated, so demanding, so fraught with risk, be trusted to people who answer only to themselves? Trust is a funny thing. It is the mystery -- and the genius -- of what goes on at GE/Durham.
    <snip>
    "The interview, now that was one heck of an experience," he says. "It lasted eight hours. I talked to five different people. I participated in three group activities with other job candidates. I even had to do a presentation: I had 15 minutes to prepare a 5-minute presentation."
    <snip>
    At GE/Durham, candidates are rated in 11 areas. "Only one of those involves technical competence or experience," says Keith McKee, 27, a tech-3 on Team Raven. "You have to be above the bar in all 11 of the areas: helping skills, team skills, communication skills, diversity, flexibility, coaching ability, work ethic, and so forth. Even if just one thing out of the 11 knocks you down, you don't come to work here."


Some of the stuf here reminds me of what we learnt in the storytelling training I attended recently.

Thursday, June 09, 2005

cliched conversations

As a foreigner in the US, I was always struck by the very predictable 'conversations' that people had:
- so how was your labor day?
- what're you doing this weekend?
- how was the weekend?
- got plans for july 4th?
- going anywhere for memorial day?

I put the word conversations in quotes because it seemed a lot of times that people weren't even interested in the answers, they were just asked because of a strange notion of politeness.

I'm sure it's a universal phenomenon and not at all restricted to the US. But I can't come up with similar examples in Bangalore.

Except for one... for the past three weeks, people have constantly been asking me how preparations are going for the wedding and if I'm all set. Worse still, the same person will ask me the same question two days in a row, as if my 'preparations' change on a minute-to-minute basis.

Just like the conversations above, a lot of times it feels like people are just asking me to be polite. Especially given that I've been telling them that my mom is doing all the preparations. (An upside of having the wedding in India is that my parents and family are doing a lot of the preparations).

Actually I'm kind of nervous that I haven't had to do much. You know that dream where you show up in school and everyone's laughing and you look down to realize you forgot to wear your pants? I have that same feeling. Like I'm going to show up in Pune and realize that I forgot to do something really basic. :)

Anyways, I'm leaving for Delhi this Monday, after which I'll spend a week trekking in gorgeous Ladakh. From there I fly straight to Pune to tie the proverbial knot.

Tuesday, June 07, 2005

phishing

Here's an interesting way to beat phishers and their scams:
    If you get phishing e-mail, go the web sites and enter false data. Make up everything -- name, sign-on name, password, credit card numbers, everything. Instead of one million messages yielding 100 good replies, now the phisher will have one million messages yielding 100,000 replies of which 100 are good, but WHICH 100?

    This technique kills phishing two ways. It certainly increases the phishing labor requirement by about 10,000X. But even more importantly, if banks and e-commerce sites limit the number of failed sign-on attempts from a single IP address to, say, 10 per day, theft as an outcome of phishing becomes close to impossible.

we're not that special.

post by Ming talks about some experiments on monkeys by economists.

We're just another species in the evolutionary race that Nature is hosting.

There's nothing special about humans. There's nothing inferior about the other one hundred million species on the planet.

Monday, June 06, 2005

danger: darwin harmful

Darwin got an two honorable mentions here. The damn idiots. Now I'm ashamed at not having read all the books on that list.

Thursday, June 02, 2005

lost moments

I was driving back from work just now, at about 2:30am. I saw this amazing sight on the road. There were 4 people riding bicycles in a rectangular formation in the middle of the road. Across their heads was draped a huge something made of plywood. I couldn't get close enough to see, but it might have been a billboard. I'm guessing the distance between the front and rear cyclists was about 12-15 feet, and the distance between two adjacent cyclists was about 8-10 feet.

Only in India. I wish I'd had a camera with me!

Tuesday, May 31, 2005

hectic weekend

This weekend was nuts.

Sindya signed me up for a 3 day class taught by the Teachers Training Foundation in Bangalore - it's an orientation meant for volunteers that are going to take part in the Reading For Real program next year.

Sindya and I have signed up for a few months next year; we'll be storytelling some books to young children who don't have a strong grasp of english, and encouraging them to read.

The training was awesome (though I bunked Friday because of work). I met a bunch of cool people. We learnt about pre/during/post reading activities aimed at holding the children's interest in the story and making sure they grasped it. We got a demo from an amazing storyteller - she had a room full of adults completely mesmerized by her rendition of a story meant for 6 year olds!


On Saturday night, I signed up to do an all-night bicycle ride. We met up at M.G. Road at 10pm. The organizers piled us into a van and took us to a house south of bangalore, off Kanakpura road, where they had a bunch of cycles. The cycles weren't that great at all - no gears, not the perfect size (too small for me), and a little rickety. They didn't provide helmets either. I was a little disappointed, given that we had paid Rs. 700 for the outing. Anyways, we started with an 8-9 km ride on the highway (it was about 11:45 by the time we started so not too much traffic) after which we took a detour into some small village roads. The roads were super muddy from the heavy rain that we've been having, and generally bumpy because they're inner roads. Needless to say, my backside was in severe pain by the end of the ride. We finished the appx 40km ride at about 5:15am. I had to walk the last 1-2 km of the ride because (a) my butt was sore, (b) the last 9km was uphill, (c) my back tire was pretty flat so it was superhard to ride.

Anyways, I reached home at 7am, took a shower and passed out. At 9:30am I had to pull myself out of bed and spend the entire day (10-5) in the TTF class again.

I ended up getting two movies from vikas - Les triplettes de Belleville (which I'd seen before) and The House of Flying Daggers, which I wanted to see. I finally passed out at 10pm.

It's somehow Tuesday already and I'm down with a terrible cough and cold. I'm guessing it's partly to do with a hectic weekend.

Monday, May 23, 2005

Infrastructure

For the second time in a little over a month, our office flooded. Well, not really our whole office. Let me explain.

It doesn't rain that much in Bangalore. It must've rained about 30-40 minutes today. But it rained pretty darn hard. Actually, it even hailed a little. We were standing inside and just joking about how much it would suck if it started to leak (since the roof leaked about a month ago) when someone was like "oh crap! let's go check the server room!"

So we rush over to the rooms where the various pieces of huge electrical equipment are and, sure enough, one of the balconies has overflowed and water has now gone into the main electrical room (where there are electrical wires running on the ground). Some people rushed to start bailing out the room with buckets but then came to their senses. We started bailing the balcony so that no more water would overflow into the electrical room. We couldn't just shut it off because all the servers would have an unclean shutdown. So the sysadmins started doing a server-room shutdown.

In the meantime, another of the balconies was starting to flood, and threatening to flood the office. In addition, the awnings in that balcony were sagging from the weight of water that had collected on them. So we alternated between pushing water off the awnings (to save it from collapsing from its weight) and bailing water out of the balcony (to prevent the office from flooding). Fun times.

It's stopped raining now but I'm guessing it will rain some more tonight. I decided to leave my car in the basement (although I don't really trust that building too much) and take a taxi home in case the roads were bad. A few fallen trees and fender-benders, but nothing major - at least on my route home. I can't imagine what some of the busier streets (Airport Road, Hosur Road) are like. The power is out at home - my building has generator backup (like many apartment complexes here) so I'm still blogging away.

Last time this happened, we had a holiday the next day (actually it ended up being a half day because the servers were brought back up). Wonder if I get to stay in tomorrow.

Sunday, May 22, 2005

Enterprise Software

This is worth a read. Excerpt:
    “Enterprise Software” is a polite way of saying shitty legacy systems and overly complex requirements.

I'm sure that resonates with many. The article is hilarious.

Publishing

A while ago, Neil Gershenfeld gave a talk at amazon about technologies coming down the pipeline. He talked about being able to download blueprints to a home 'printer' and have it 'print out' a working bicycle.

Why is that revolutionary? Well first, there's the obvious cutting of shipment costs. More importantly, by totally changing the cost structure of goods delivery, it removes a lot of traditional Economies of Scale, thus lowering the barrier-to-entry for a new 'producer' (designer might be a better word) in the market. The producer is now free to experiment and customize since the cost of a 'failed' product is only the time lost in the effort.

Thinking about things like that makes one rethink where exactly a company like amazon fits into the market. Though we're not quite there yet (no instantly downloadable bicycles), lulu.com is a company that allows people to upload books in digital format. You can then sell your book (as a physical book, not a download) on sites like amazon. Pretty darn cool, if you ask me. Now that publishers are removed from the equation, the barrier-to-entry for an author is significantly lowered. The up-front costs to them are minimal, and the cost of selling zero copies of your book is just the time you spent writing the book; not the cost of hundreds of unsold printed-and-bound physical books.

Most people aren't ready to read books in digital format (I know I hate reading long articles on my laptop), which has also limited how Long the Tail is for books (the first limiting factor being the barrier-to-entry for authors described above). Lulu has just lengthened the tail for books by changing their cost-structure. Additionally, by allowing self-published books to be sold on amazon, lulu.com is addressing the issue that customers face as they wade through the Long Tail: "how to sort through the junk and find the Good Stuff?" Through its personalization and recommendation features, Amazon has made a name for itself in helping people find and discover items. Lulu is smart to leverage that.

(tip from tpwire)
update a little more reading revealed that amazon hasa acquired booksurge, a similar company.

Government in business.

India has a socialist legacy. Post independence, we walked the 'middle path' (moderate socialism, or something) for many years. We had privately owned businesses but they were heavily regulated. The government enjoyed monoploy in many core industries.

Today, there are still many examples of government owned businesses; 'state emporiums' (handicrafts, etc. characteristics of each state), aircraft manufacturers, coffee shops... the list is long and varied.

It always surprises visiting Americans to hear of this.

Well I just read that the United States Army makes first-person shooter games. To quote from the article:

    “The technology is what we use for actual training,” says Major Chris Chambers, who directed the E3 presentation for the Army. “We brought it to E3 because it’s also really cool.”
    < snip >
    But Chambers says he is at the expo for the same reason as the other exhibitors: to showcase the game.
    “We intend to be a major player in this industry for a long time,” he says. (emphasis mine)

Now that is messed up.

Tuesday, May 17, 2005

Why gmail?

Maybe this is common knowledge, but it just occurred to me why gmail is such a brilliant idea. Google's pagerank estimates the importance of pages based on links between pages. If I think a page is important, I'll probably link to it. But what if I don't have a website to link from?

When I'm browsing the web, I'm constantly emailing people links that I think they'd find interesting. Sometimes if I'm not feeling lazy, I'll even send some blurb that explains why they should click on the link.

Until recently, Google was completely missing out on this simple and highly reliable way of guaging the importance of pages. Now that google sees my email, they can still apply 'pagerank', but some of the pages in question happen to be emails that people are sending to each other.

Frikkin brilliant. And I thought it was only about the ad revenue.

Monday, May 16, 2005

tagging

I love flickr and delicious - both sites that let you organize 'stuff' (photos, links respectively) using tags (arbitrary space separated words that you can assign to things).

Clay Shirky recently posted an excellent essay entitled Ontology is Overrated in which he talks about a bunch of cool stuff and then goes on to say why tags are so damn useful.

But here's an excerpt from early in the document (where he's talking about categorization) that caught my eye:

    Ontological classification works well in some places, of course. You need a card catalog if you are managing a physical library. You need a hierarchy to manage a file system. So what you want to know, when thinking about how to organize anything, is whether that kind of classification is a good strategy. (emphasis mine).


I'd like to question that asusmption actually... Do you really need a hierarchy to manage a file system? I've spent the last few days going through my OS textbook and doing a bunch of reading/searching about file system design. I think that's an assumption that's ripe to be questioned.

I know that some people have already tried building in 'tag-like' notions into a filesystem. In fact, the defacto filesystem in OSX, HFS+, now (as of Tiger) has support to add arbitrary key/value attributes to files. I haven't downloaded Tiger but, from what I remember from reading reviews, this feature is currently used in only a few places like for ACLs and maybe some Spotlight metadata.

Getting back to the point; why does a file system need heirarchy in order to be manageable? One survey/research paper I read (will post link when I find it again) essentially says "there's too much software out there that assumes that the file sysem is heirarchical so I'm not going to even talk about building something that doesn't have any heirarchy." That may actually be the correct, practical viewpoint to take. But real innovation comes from questioning the 'practical viewpoints' of our day, right?

If a URL is an inode and the title of an html page is a filename, then your filesystem and flickr are not too different. That said, URLs are not as opaque as inodes. If I see a URL with mozilla.org in it, that gives me some clue about the contents even though the exact semantics I associate with it may be varied.

If you know of some work in this area, please enlighten me. In the mean time, I'll be sure to use my not-so-copious amounts of free time to try to read more on the subject.

update: Thanks, Huat, for the pointer. As always, I feel like an idiot for being so clueless about what's out there. :)
update: Interesting; with WinFS, msft is trying to do with the PC, something similar to what the semantic web is trying to do to the internet - give well defined structure and semantics to data. It's hard to get it to work on the web because of how many diverse applications there are, and how loosely structured the data (HTML) fundamentally is. On the PC, though, msft-written software probably makes up a majority of the software you run (not me; I have a mac like any self-respecting yuppie). More importantly, much of the content you create on your PC is created using msft applications. Two questions come up in my mind: (1) how easy is to work with content not created using msft applications; (2) how useful/intuitive is strong typing (vs. tagging) to the end user.
more: I came across this discussion of Longhorn which includes a mini rant at the end about heirarchical file systems.

Sunday, May 15, 2005

run

The Bangalore Marathon took place today and I ran a 7K 'celebration run'. My time was too pathetic to even mention. My aim was to not stop running but I set my sights too low; I'm totally bummed that I didn't push myself harder. I finished the run with a lot of energy left, which means I probably could've run a lot faster.

Anyways, it was a fun morning; we ran with bright yellow Dream a Dream shirts and then went to Koshy's with some friends. Lunch took an hour and a half to get served and wasn't even that good. Note to self: no more Koshy's. After beer and a big meal though, I'm ready to pass out.

Thursday, May 12, 2005

pod-frikkin-casting

A friend of mine has been giving me headaches by constantly talking about podcasting and how cool it is. So last night I finally decided to see what the buzz is about and downloaded a podcast client for the mac.

I tried the free 'Lite' version first. The full version has a 30 day trial, but I haven't tried it yet. The Lite version allows you to specify feeds, and download them to a local folder. For any audio files downloaded, you can automatically drop them into iTunes and delete them from the download folder.

This morning, I have two news feeds (one from KOMO news in Seattle and one from Northwest Public Radio), and a jazz feed (don't have the link with me right now) on my iPod.

It was pretty cool to listen to random news on the way to work (instead of the Hindi-Pop and talk shows that are usually on the radio here). Will post more experiences shortly.

Given that the aforementioned friend still hasn't actually downloaded a podcast client, maybe I can talk his head off about podcasting instead.

Monday, May 09, 2005

how not to get pigeonholed, and other good stuff...

... from fastcompany:

from Escape Your Pigeonhole
    Vargas highlighted the research skills she gained, rather than the law, and landed a market-research job
    <snip >
    "I typically make a commitment to a project for 18 to 24 months and offer upfront to groom one of my staff to replace me," Olding says.
    <snip >
    get pigeonholed as a person who leads new ventures

from Change or Die
    "A relatively small percentage of the population consumes the vast majority of the health-care budget for diseases that are very well known and by and large behavioral."
    <snip >
    Unless you work on it, brain fitness often begins declining at around age 30 for men, a bit later for women. "People mistake being active for continuous learning," Merzenich says. "The machinery is only activated by learning. People think they're leading an interesting life when they haven't learned anything in 20 or 30 years. My suggestion is learn Spanish or the oboe."
    <snip >
    What happens if you don't work at mental rejuvenation? Merzenich says that people who live to 85 have a 50-50 chance of being senile. While the issue for heart patients is "change or die," the issue for everyone is "change or lose your mind." Mastering the ability to change isn't just a crucial strategy for business. It's a necessity for health. And it's possibly the one thing that's most worth learning.

The article talks about how large changes are sometimes so much easier than small, incremental changes. Reminds me of the economic reforms India undertook in the early 90s - it only happened because we were only left with three weeks worth of foreign currency reserves. Since the sweeping changes that took place then, the economic reforms have been a lot slower to come.
Beth over at Creating Passionate Users writes about this article as well.

humor me

... and answer this quick poll by adding a comment.

    Roughly what percentage of the conversations you have on the web are with people you know, vs. with random digerati?


Obviously, most of your conversations about personal issues are going to be with people you know. But I'm talking about conversations about generally non-personal things; movies, current affairs, technology, politics, etc.

There are conversations going on in the blogosphere (through posts, comments, and trackbacks) and in communities like slashdot. Call these 'public' conversations. There is definitely a group of people who are actively engaged in those.

I know that many of my non-personal, electronic conversations are with people I know in the real-world. Call these 'private' conversations. Until before the blogosphere hit its tipping point, most conversations were like this - 'private' in nature and enabled by email, IM, or the likes.

(This conversation that I'm engaging you in is technically 'public' though I'm guessing that my readership right now is mostly people I know)

So you tell me: how important are 'public' conversations to you? Do you find yourself still reverting back to private conversations most of the time? Why? For non-personal stuff, why not open up your conversations to anyone who cares to participate?