Raspberry Pi as a Network Monitor: The $35 Solution That Replaced My Nagios Box

I had a Nagios instance running on a VPS that cost me $20/month. It monitored 12 endpoints. Twelve. A Raspberry Pi replaced it for $35 total and I haven’t looked back.

black and blue usb cable — Photo by Jainath Ponnala / Unsplash

Why a Raspberry Pi

Because it’s always on, draws 3 watts, and I had one in a drawer collecting dust. That’s the real reason. Every other justification came after I plugged it in and realized it actually works.

A Pi 4 with 2GB RAM runs Python monitoring scripts, a Telegram bot for alerts, and a Grafana dashboard without breaking a sweat. CPU stays under 10% most of the time.

The Setup

I use Raspberry Pi OS Lite ( the headless one ). No desktop environment, no wasted resources. SSH in and everything runs from systemd services.

Here’s my directory structure:

mkdir -p /opt/netmon/{scripts,config,data,logs}
cd /opt/netmon

The core is a Python script that pings endpoints, checks HTTP responses, and measures latency. Nothing fancy. It works.

isthis

The Monitor

Here’s the main checker. I run it every 60 seconds via a systemd timer:

#!/usr/bin/env python3
"""netcheck.py - Network endpoint monitor for Raspberry Pi"""

import subprocess
import json
import time
import requests
from datetime import datetime

CONFIG = "/opt/netmon/config/endpoints.json"
DATA_DIR = "/opt/netmon/data"

def load_endpoints():
    with open(CONFIG) as f:
        return json.load(f)["endpoints"]

def ping_host(host, count=3):
    """Ping a host and return avg latency in ms."""
    try:
        result = subprocess.run(
            ["ping", "-c", str(count), "-W", "2", host],
            capture_output=True, text=True, timeout=10
        )
        if result.returncode != 0:
            return None
        for line in result.stdout.split("\n"):
            if "avg" in line:
                parts = line.split("=")[-1].split("/")
                return float(parts[1])
    except Exception:
        return None

def check_http(url, timeout=5):
    """Check HTTP endpoint, return (status_code, latency_ms)."""
    try:
        start = time.time()
        resp = requests.get(url, timeout=timeout, verify=False)
        latency = (time.time() - start) * 1000
        return resp.status_code, latency
    except Exception:
        return None, None

def run_checks():
    endpoints = load_endpoints()
    results = []
    for ep in endpoints:
        result = {
            "name": ep["name"],
            "type": ep["type"],
            "target": ep["target"],
            "timestamp": datetime.utcnow().isoformat(),
            "status": "unknown"
        }
        if ep["type"] == "ping":
            latency = ping_host(ep["target"])
            result["latency_ms"] = latency
            result["status"] = "up" if latency else "down"
        elif ep["type"] == "http":
            code, latency = check_http(ep["target"])
            result["status_code"] = code
            result["latency_ms"] = latency
            result["status"] = "up" if code and 200 <= code < 400 else "down"
        results.append(result)
    ts = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
    with open(f"{DATA_DIR}/results_{ts}.json", "w") as f:
        json.dump(results, f, indent=2)
    return results

if __name__ == "__main__":
    run_checks()

The Config File

Endpoints live in a simple JSON file ( no YAML, no TOML, no over-engineering ):

{
  "endpoints": [
    {"name": "main-site", "type": "http", "target": "https://davideandreazzini.co.uk"},
    {"name": "api-server", "type": "http", "target": "https://api.example.com/health"},
    {"name": "db-host", "type": "ping", "target": "10.0.1.50"},
    {"name": "gateway", "type": "ping", "target": "192.168.1.1"}
  ]
}

Alerts via Telegram

When something goes down, I get a Telegram message. No email chains, no Slack webhooks I forgot to configure. One bot, one chat, one notification channel.

import requests

TELEGRAM_TOKEN = "your-bot-token"
CHAT_ID = "your-chat-id"

def send_alert(name, status, target):
    msg = f"⚠ {name} is {status}\nTarget: {target}"
    url = f"https://api.telegram.org/bot{TELEGRAM_TOKEN}/sendMessage"
    requests.post(url, json={"chat_id": CHAT_ID, "text": msg})

I call send_alert() inside run_checks() when status changes from up to down. No spam on every check, only on state transitions.

Grafana Dashboard

InfluxDB + Telegraf + Grafana. Yes, on the Pi. InfluxDB is the only thing that occasionally uses more than 200MB RAM.

# /etc/telegraf/telegraf.d/netmon.conf
[[inputs.file]]
  files = ["/opt/netmon/data/results_*.json"]
  name_override = "netmon"
  data_format = "json"
  json_strict = true

  # Clean up old JSON files after reading
  file_delete_after_read = false

Grafana runs on port 3000. I reverse-proxied it through nginx with basic auth. The dashboard shows latency over time, uptime percentage, and current status for each endpoint.

Systemd Service

The checker runs as a systemd timer. Not a cron job. I used cron for two weeks then switched because systemd timers give you proper logging and dependency management.

# /etc/systemd/system/netcheck.service
[Unit]
Description=Network Monitor Checker
After=network.target

[Service]
Type=oneshot
ExecStart=/usr/bin/python3 /opt/netmon/scripts/netcheck.py
User=pi
WorkingDirectory=/opt/netmon

# /etc/systemd/system/netcheck.timer
[Unit]
Description=Run netcheck every 60 seconds

[Timer]
OnBootSec=30
OnUnitActiveSec=60
AccuracySec=5s

[Install]
WantedBy=timers.target

sudo systemctl enable netcheck.timer && sudo systemctl start netcheck.timer

Power Outage Recovery

The Pi boots back up in about 15 seconds. All services are set to restart on failure. I added a small UPS ( a $20 USB battery pack with pass-through charging ) that gives me about 2 hours of runtime during outages.

The monitoring data writes to the SD card ( which will eventually die, I know ). I set up a nightly rsync to a NAS to back up the InfluxDB data directory. When the SD card dies, I flash a new one and restore from the NAS. Takes 10 minutes.

What It Doesn’t Do

It doesn’t do distributed tracing. It doesn’t do APM. It doesn’t do log aggregation. If you need those things, use Grafana Cloud or Datadog. This setup is for small-scale monitoring of a handful of endpoints. The stuff I actually care about at 3 AM when something goes wrong.

Total cost: $35 for the Pi, $12 for the SD card, $10 for the case, $20 for the UPS battery. That’s $77 one-time versus $240/year for the VPS I was running Nagios on.