Skip to main content

Monitor node status

This guide explains how to set up basic monitoring to check the status of a Chainweb node using a script, how to add notifications to the script, and how to run the script regularly.

Basic health check

The following sample script illustrates how you can create a bash script that checks the status of a Chainweb node and restarts it, if necessary:

#!/usr/bin/env bash

# Define environment variables with default values if not already set
export CHAINWEB_NETWORK=${CHAINWEB_NETWORK:-mainnet01}
export CHAINWEB_PORT=${CHAINWEB_P2P_PORT:-2022}
SERVICE_NAME="kadena-node.service" # Replace with your service
LOG_FILE="/var/log/chainweb/service.log"

# Function to log messages with timestamps
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}

# Function to check the service status
check_service_status() {
curl -fsLk --connect-timeout 10 "https://localhost:$CHAINWEB_PORT/chainweb/0.0/$CHAINWEB_NETWORK/cut"
}

# Function to restart the service and log the outcome
restart_service() {
if systemctl restart "$SERVICE_NAME"; then
log_message "Successfully restarted the Chainweb service."
else
log_message "Failed to restart the Chainweb service."
exit 1 # Exit with an error status if the restart fails
fi
}

# Main logic to check service status and potentially restart the service
if check_service_status; then
log_message "Chainweb service is running."
else
log_message "Chainweb service might be down. Attempting to restart..."
restart_service
fi

This script does the following:

  • Sets up environment variables for the Chainweb network and port.
  • Defines functions for logging, checking service status, and restarting the service.
  • Checks if the Chainweb service is running by querying its API.
  • If the service is down, it attempts to restart it.

Adding notifications

To add notifications, you can modify the log_message function to send emails or messages to services like Telegram. Here's an example of how you might modify the function to send a Telegram message:

TELEGRAM_BOT_TOKEN="your_bot_token"
TELEGRAM_CHAT_ID="your_chat_id"

log_message() {
message="$(date '+%Y-%m-%d %H:%M:%S') - $1"
echo "$message" >> "$LOG_FILE"
curl -s -X POST "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
-d chat_id="$TELEGRAM_CHAT_ID" \
-d text="$message"
}

Replace your_bot_token and your_chat_id with your actual Telegram bot token and chat ID.

Additional checks

You can add more robust checks to ensure your node is functioning correctly. For example, you could add a function to perform a local call to validate functionality.

Running the script regularly

In most cases, you should establish a schedule for monitoring nodes at a regular interval. For example, to run the sample script every 30 seconds, you can set up a cron job or a systemd timer.

Using cron

To set up a cron job for monitoring a node:

  1. Open your crontab file for editing by running the following command:

    crontab -e
  2. Add an instruction for running the monitoring script similar to the following line:

    * * * * * /path/to/your/monitor-node-script.sh; sleep 30; /path/to/your/monitor-node-script.sh

    This instructiopn runs the monitor-node-script.sh script every minute and then again 30 seconds later.

Using systemd

To set up a systemd timer for monitoring a node:

  1. Create a /etc/systemd/system/chainweb-monitor.service service file similar to the following:

    [Unit]
    Description=Chainweb Node Monitor

    [Service]
    ExecStart=/path/to/your/script.sh

    [Install]
    WantedBy=multi-user.target
  2. Create a /etc/systemd/system/chainweb-monitor.timer timer file similar to the following:

    [Unit]
    Description=Run Chainweb Node Monitor every 30 seconds

    [Timer]
    OnBootSec=30
    OnUnitActiveSec=30s

    [Install]
    WantedBy=timers.target
  3. Enable the timer by running the following command:

    sudo systemctl enable chainweb-monitor.timer
  4. Start the timer by running the following command:

    sudo systemctl start chainweb-monitor.timer

    This timer runs the monitoring script every 30 seconds, providing real-time information about the status of the Chainweb node. The script also automatically attempts to restart the node, if issues are detected.