By: dreev
Spec level:
Last updated: 2020-10-19
Gissue: TBD; see also forum discussion
Premise: You and I (say Beeminder and Complice or TaskRatchet) each have a website where we depend on being able to send and receive email.
Problem: Sometimes email breaks mysteriously and silently. That’s really bad because (in the immortal words of Kevin Lochner) what if Jesus emails us? He’ll think we’re ignoring Him.
There are solutions like Pingdom to make sure you know right away if your website goes down. But it’s harder with email. How does an outside service know if an email made it through to you? And same goes for outgoing email from your site — hard to know for sure that it’s always making it out.
Idea: We each run a little daemon and they just constantly email each other.
Every so many minutes (hours? we want enough of a gap to not trigger spamboxing and also be sure that messages won’t be sent out of order) the daemon sends the following to its buddy:
This is message N. Did you get my last email at time T1? I’m sending this one at time T2!
PS, we’re all bots here so let me say that again in a machine-friendlier way:
{"upcount": N, "uplast": T1, "upnow": T2}
where the N is a counter and the T1 and T2 are unixtimes. The daemon, when receiving such a message from its buddy, alerts us (via Slack or SMS or an alternate email address or PagerDuty or whatever) if the N is not +1 from the previous N or if too much time has passed since receiving anything.
(Later we could get fancy with tracking stats based on the timestamps if we wanted.)
Note that this isn’t a general delivery confirmation mechanism. It’s just a continuous “does my email seem to be working?” service.
And maybe this is obvious but the two buddies should be on totally different domains and ESPs. It defeats a lot of the point if one outage can take down both the daemons!
this service has been around for long enough, and it’s been stable in features for long enough, that i’m OK using it:
https://healthchecks.io/
it’s basically cron + notifications, nothing that shouldn’t be built into every f’ing computer anyone buys. basically, if it doesnt’ get poked by you at the expected time, it alerts you. one of the ways to poke it is via email! you can either say “any email” or “look for this keyword in the subject line too”. does the email reply bot preserve CCs or maybe replies to all the To folks? anyway, it doesnt’ really matter, i think sending via cron and then having honeycomb look for datapoints is a decent test.