DRAFT: Lupine Lachrymosis; Or, The Anti-Wolf-Crying Principle

A photo of a wolf with a big cartoon teardrop

Hello fellow startups! Maybe we don’t get to call ourselves a startup now that we’re a decade old. Hello fellow established business people and caretakers of websites (and probably especially startups)!

You probably think you know the Anti-Wolf-Crying Principle. It’s pretty aphoristic, yes, but there’s an important lesson here that we learned — ok, are still learning — the hard way. So we wanted to enshrine it for blog-posterity. The idea is pretty obvious; so much so that we’re not sure of an original source for it. We first heard it articulated by Patrick McKenzie as the “0th rule of monitoring alerts”. But first we should say what monitoring alerts are.

System Notifications

“If you ever get a monitoring alert and think ‘Great, I don’t need to get out of bed’, get out of bed and fix that alert.”

A system notification or monitoring alert is how you get automatically notified when your website goes down or something about your app or service breaks in a way that is making a significant fraction of your users very sad or mad. We call ours airhorns, or when we send them to Slack, slackhorns. In the old days people would wear special devices that would beep and they’d have to find a payphone or something to find out what happened. Even if you’re too young to remember that, you might still call a schedule for who’s on call to deal with the website breaking a “pager rotation”. The point is, if your source of livelihood breaks you need to get alerted. So far, so good.

The Actual Thing About Wolf-Crying

Here’s the Rule: Every system notification you get must completely unambiguously without exception be something urgent enough to take some immediate action on. If a non-urgent notification arrives then the completely unambiguously no-exceptions urgent action is to make whatever system event that triggered the notification stop doing so.

Patrick McKenzie’s version is that if you ever get a monitoring alert and think “Great, I don’t need to get out of bed”, get out of bed and fix that alert.

The Embarrassing Part (For Us)

Beeminder’s notifications are mostly non-urgent but are things we need ambient awareness of. And — lo! — we’ve had downtime and quasi-downtime (call it brownouts; you can trawl our old site status alerts if you’re curious) that would’ve been minutes if we’d simply noticed the notification telling us that everything had broken. But we didn’t because it was in a river of noise.

Facepalm.

Aaaand halt.

This is stuck in draft mode until we can conclude with something better than “so, yeah, do as we say not as we do I guess”.

After we get that sorted, we can conclude like this:

If it’s something you need ambient awareness of, it belongs in your admin dashboard, not in your alerts. Because, emphatically, every alert is an emergency: either handle it or handle the emergency that is your alert system crying wolf!