Logging, Metric and Monitoring Suggestions


#1

When I comes to monitoring the health of the system in a way that also includes the health of the Validator Node; do you have any recommendations?

Splunk, Graylog, ELK, Solar Winds, etc?

I assume that any will do in monitoring the general heath of the system, but things get a bit custom with the validator logs.

Thanks

Phill


#2

This is a good question for all the stewards. Please post it to the validator-support or stewards channel in sovrin slack.


#3

I would recommend posting this under #monitoring on slack also, since we are still in discussion planning/phases of how we want to monitor the network. The people there will likely want to be in on the discussion.


#4

T-Labs is using check_mk available here https://mathias-kettner.com/
I am checking for the ports to be open, the python3 processes to be running and wrote local checks that read validator_info output and converts them to the Nagios compatible format check_mk is using.
I added the --nagios option to validator_info to create that output.
https://github.com/hyperledger/indy-node/blob/master/scripts/validator-info#L718

This is a diagram produced


#5

thanks, I’ll take a look