Logging, Metric and Monitoring Suggestions


When I comes to monitoring the health of the system in a way that also includes the health of the Validator Node; do you have any recommendations?

Splunk, Graylog, ELK, Solar Winds, etc?

I assume that any will do in monitoring the general heath of the system, but things get a bit custom with the validator logs.




This is a good question for all the stewards. Please post it to the validator-support or stewards channel in sovrin slack.


I would recommend posting this under #monitoring on slack also, since we are still in discussion planning/phases of how we want to monitor the network. The people there will likely want to be in on the discussion.


T-Labs is using check_mk available here https://mathias-kettner.com/
I am checking for the ports to be open, the python3 processes to be running and wrote local checks that read validator_info output and converts them to the Nagios compatible format check_mk is using.
I added the --nagios option to validator_info to create that output.

This is a diagram produced


thanks, I’ll take a look