9.3 Routine monitoring checklist

OpenCRVS comes with a built-in set of automatic email alerts, that capture a minimal set of critical limits & conditions necessary for the product to work. When these alerts are triggered, the issues need to be solved as soon as possible. Oftentimes it's better if these issues are solved and planned for even before reaching the critical limits.

It's a good practice to monitor your production installation's infrastructure manually on a daily basis. This practice improves the reliability of your environment and gives your team a chance to include server improvements in planned work. The following list captures some of the essential values that should be followed manually

Disk space usage on all nodes is less than 70%

  • Login to Kibana

  • Navigate to Observability -> Metrics -> Metrics Explorer

  • Use the parameters listed in 7.2 Infrastructure health under Available disk space

  • Verify all used disk space is under 70% on all nodes

CPU and memory usage less than 80% on all nodes

  • Login to Kibana

  • Navigate to Observability -> Metrics -> Metrics Explorer

  • Use the parameters listed in 7.2 Infrastructure health under CPU usage

  • Select a timeframe of 24 hours

  • Verify CPU load has not exceeded 80% on any of your server nodes

No errors in any services (Observability -> APM -> Services -> [service] -> Errors)

Last updated