Monitor node status
This guide explains how to set up basic monitoring to check the status of a Chainweb node using a script, how to add notifications to the script, and how to run the script regularly.
Basic health check
The following sample script illustrates how you can create a bash script that checks the status of a Chainweb node and restarts it, if necessary:
#!/usr/bin/env bash
# Define environment variables with default values if not already set
export CHAINWEB_NETWORK=${CHAINWEB_NETWORK:-mainnet01}
export CHAINWEB_PORT=${CHAINWEB_P2P_PORT:-2022}
SERVICE_NAME="kadena-node.service" # Replace with your service
LOG_FILE="/var/log/chainweb/service.log"
# Function to log messages with timestamps
log_message() {
echo "$(date '+%Y-%m-%d %H:%M:%S') - $1" >> "$LOG_FILE"
}
# Function to check the service status
check_service_status() {
curl -fsLk --connect-timeout 10 "https://localhost:$CHAINWEB_PORT/chainweb/0.0/$CHAINWEB_NETWORK/cut"
}
# Function to restart the service and log the outcome
restart_service() {
if systemctl restart "$SERVICE_NAME"; then
log_message "Successfully restarted the Chainweb service."
else
log_message "Failed to restart the Chainweb service."
exit 1 # Exit with an error status if the restart fails
fi
}
# Main logic to check service status and potentially restart the service
if check_service_status; then
log_message "Chainweb service is running."
else
log_message "Chainweb service might be down. Attempting to restart..."
restart_service
fi
This script does the following:
- Sets up environment variables for the Chainweb network and port.
- Defines functions for logging, checking service status, and restarting the service.
- Checks if the Chainweb service is running by querying its API.
- If the service is down, it attempts to restart it.
Adding notifications
To add notifications, you can modify the log_message function to send emails or messages to services like Telegram.
Here's an example of how you might modify the function to send a Telegram message:
TELEGRAM_BOT_TOKEN="your_bot_token"
TELEGRAM_CHAT_ID="your_chat_id"
log_message() {
message="$(date '+%Y-%m-%d %H:%M:%S') - $1"
echo "$message" >> "$LOG_FILE"
curl -s -X POST "https://api.telegram.org/bot$TELEGRAM_BOT_TOKEN/sendMessage" \
-d chat_id="$TELEGRAM_CHAT_ID" \
-d text="$message"
}
Replace your_bot_token and your_chat_id with your actual Telegram bot token and chat ID.
Additional checks
You can add more robust checks to ensure your node is functioning correctly. For example, you could add a function to perform a local call to validate functionality.
Running the script regularly
In most cases, you should establish a schedule for monitoring nodes at a regular interval.
For example, to run the sample script every 30 seconds, you can set up a cron job or a systemd timer.
Using cron
To set up a cron job for monitoring a node:
-
Open your crontab file for editing by running the following command:
crontab -e -
Add an instruction for running the monitoring script similar to the following line:
* * * * * /path/to/your/monitor-node-script.sh; sleep 30; /path/to/your/monitor-node-script.shThis instructiopn runs the
monitor-node-script.shscript every minute and then again 30 seconds later.
Using systemd
To set up a systemd timer for monitoring a node:
-
Create a
/etc/systemd/system/chainweb-monitor.serviceservice file similar to the following:[Unit]
Description=Chainweb Node Monitor
[Service]
ExecStart=/path/to/your/script.sh
[Install]
WantedBy=multi-user.target -
Create a
/etc/systemd/system/chainweb-monitor.timertimer file similar to the following:[Unit]
Description=Run Chainweb Node Monitor every 30 seconds
[Timer]
OnBootSec=30
OnUnitActiveSec=30s
[Install]
WantedBy=timers.target -
Enable the timer by running the following command:
sudo systemctl enable chainweb-monitor.timer -
Start the timer by running the following command:
sudo systemctl start chainweb-monitor.timerThis timer runs the monitoring script every 30 seconds, providing real-time information about the status of the Chainweb node. The script also automatically attempts to restart the node, if issues are detected.