Why Are Downtime Detectors So Complicated?
I needed to monitor a few servers: this website, and some APIs I run for my small company. When something goes down, I want a notification so I can confirm and take a corrective action. That's all I need.
I used to use Uptime Kuma. It's a cool project and it has many features and integrations, but it was too complex for what I needed, especially for something I'm supposed to self-host since it needs a database/state to work.
So, I built my own.
What I Actually Need
My ideal downtime detector does three things:
- Ping servers periodically (HTTP requests, check status codes)
-
Detect when state changes, e.g.,
up → downanddown → up - Trigger an action when that happens (more on that below)
If I want dashboards later, I'll use Grafana. If I want complex alerting logic, I'll build that separately. Right now, I just need to know when something breaks.
What I Built
I wrote my tool in Go, with simple specs:
- One static binary, easy to deploy, no database or state whatsoever
- An easy YAML config file defining monitors (without excessive nesting)
- Only one "universal" integration: HTTP webhook
When a monitor detects a problem, it calls a webhook (POST
with the relevant info). Today, I just point that webhook at an ntfy.sh topic and get notified via their iOS app. It's all free and
open source software.
From that simple base, I can easily add features later if needed. Either by listening to the ntfy topic (it's just a pub/sub), or by writing a custom backend tailored for my needs.
So far, it works and I'm happy with this solution.
How I Run It
I run this tool on two instances: my Raspberry Pi at home, and a cheap VPS in the cloud (different AS and network).
Sometimes my home internet drops. The Pi instance reconnects and sends
notifications, but the VPS stayed quiet the whole time. I know it was just
my connection.
When both instances trigger at once, I know there's a real problem.
The point is that I don't need the software to figure this out. I just need it to let me know those two data points, and then I can decide what they mean. This software is pretty dumb actually, it expects the monitored services to be smart about what they report (my API endpoints are designed this way), and the user to be smart about the interpretation of the raw events.
Availability
I named this tool hatame.
The name is from 傍目 (hata-me), which can be translated as: a third-party perspective, viewing from the outside.
I'll open source hatame once it's proven stable in production. I'm running it now to verify it works reliably.