I had a Nagios instance running on a VPS that cost me $20/month. It monitored 12 endpoints. Twelve. A Raspberry Pi replaced it for $35 total and I haven’t looked back.
Why a Raspberry Pi
Because it’s always on, draws 3 watts, and I had one in a drawer collecting dust. That’s the real reason. Every other justification came after I plugged it in and realized it actually works.
A Pi 4 with 2GB RAM runs Python monitoring scripts, a Telegram bot for alerts, and a Grafana dashboard without breaking a sweat. CPU stays under 10% most of the time.
The Setup
I use Raspberry Pi OS Lite ( the headless one ). No desktop environment, no wasted resources. SSH in and everything runs from systemd services.
Here’s my directory structure:
mkdir -p /opt/netmon/{scripts,config,data,logs}
cd /opt/netmonThe core is a Python script that pings endpoints, checks HTTP responses, and measures latency. Nothing fancy. It works.
isthis
The Monitor
Here’s the main checker. I run it every 60 seconds via a systemd timer:
#!/usr/bin/env python3
"""netcheck.py - Network endpoint monitor for Raspberry Pi"""
import subprocess
import json
import time
import requests
from datetime import datetime
CONFIG = "/opt/netmon/config/endpoints.json"
DATA_DIR = "/opt/netmon/data"
def load_endpoints():
with open(CONFIG) as f:
return json.load(f)["endpoints"]
def ping_host(host, count=3):
"""Ping a host and return avg latency in ms."""
try:
result = subprocess.run(
["ping", "-c", str(count), "-W", "2", host],
capture_output=True, text=True, timeout=10
)
if result.returncode != 0:
return None
for line in result.stdout.split("\n"):
if "avg" in line:
parts = line.split("=")[-1].split("/")
return float(parts[1])
except Exception:
return None
def check_http(url, timeout=5):
"""Check HTTP endpoint, return (status_code, latency_ms)."""
try:
start = time.time()
resp = requests.get(url, timeout=timeout, verify=False)
latency = (time.time() - start) * 1000
return resp.status_code, latency
except Exception:
return None, None
def run_checks():
endpoints = load_endpoints()
results = []
for ep in endpoints:
result = {
"name": ep["name"],
"type": ep["type"],
"target": ep["target"],
"timestamp": datetime.utcnow().isoformat(),
"status": "unknown"
}
if ep["type"] == "ping":
latency = ping_host(ep["target"])
result["latency_ms"] = latency
result["status"] = "up" if latency else "down"
elif ep["type"] == "http":
code, latency = check_http(ep["target"])
result["status_code"] = code
result["latency_ms"] = latency
result["status"] = "up" if code and 200 <= code < 400 else "down"
results.append(result)
ts = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
with open(f"{DATA_DIR}/results_{ts}.json", "w") as f:
json.dump(results, f, indent=2)
return results
if __name__ == "__main__":
run_checks()The Config File
Endpoints live in a simple JSON file ( no YAML, no TOML, no over-engineering ):
{
"endpoints": [
{"name": "main-site", "type": "http", "target": "https://davideandreazzini.co.uk"},
{"name": "api-server", "type": "http", "target": "https://api.example.com/health"},
{"name": "db-host", "type": "ping", "target": "10.0.1.50"},
{"name": "gateway", "type": "ping", "target": "192.168.1.1"}
]
}Alerts via Telegram
When something goes down, I get a Telegram message. No email chains, no Slack webhooks I forgot to configure. One bot, one chat, one notification channel.
import requests
TELEGRAM_TOKEN = "your-bot-token"
CHAT_ID = "your-chat-id"
def send_alert(name, status, target):
msg = f"⚠ {name} is {status}\nTarget: {target}"
url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
requests.post(url, json={"chat_id": CHAT_ID, "text": msg})I call send_alert() inside run_checks() when status changes from up to down. No spam on every check, only on state transitions.
Grafana Dashboard
InfluxDB + Telegraf + Grafana. Yes, on the Pi. InfluxDB is the only thing that occasionally uses more than 200MB RAM.
# /etc/telegraf/telegraf.d/netmon.conf
[[inputs.file]]
files = ["/opt/netmon/data/results_*.json"]
name_override = "netmon"
data_format = "json"
json_strict = true
# Clean up old JSON files after reading
file_delete_after_read = falseGrafana runs on port 3000. I reverse-proxied it through nginx with basic auth. The dashboard shows latency over time, uptime percentage, and current status for each endpoint.
Systemd Service
The checker runs as a systemd timer. Not a cron job. I used cron for two weeks then switched because systemd timers give you proper logging and dependency management.
# /etc/systemd/system/netcheck.service
[Unit]
Description=Network Monitor Checker
After=network.target
[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /opt/netmon/scripts/netcheck.py
User=pi
WorkingDirectory=/opt/netmon
# /etc/systemd/system/netcheck.timer
[Unit]
Description=Run netcheck every 60 seconds
[Timer]
OnBootSec=30
OnUnitActiveSec=60
AccuracySec=5s
[Install]
WantedBy=timers.targetsudo systemctl enable netcheck.timer && sudo systemctl start netcheck.timerPower Outage Recovery
The Pi boots back up in about 15 seconds. All services are set to restart on failure. I added a small UPS ( a $20 USB battery pack with pass-through charging ) that gives me about 2 hours of runtime during outages.
The monitoring data writes to the SD card ( which will eventually die, I know ). I set up a nightly rsync to a NAS to back up the InfluxDB data directory. When the SD card dies, I flash a new one and restore from the NAS. Takes 10 minutes.
What It Doesn’t Do
It doesn’t do distributed tracing. It doesn’t do APM. It doesn’t do log aggregation. If you need those things, use Grafana Cloud or Datadog. This setup is for small-scale monitoring of a handful of endpoints. The stuff I actually care about at 3 AM when something goes wrong.
Total cost: $35 for the Pi, $12 for the SD card, $10 for the case, $20 for the UPS battery. That’s $77 one-time versus $240/year for the VPS I was running Nagios on.