7.3 Routine monitoring checklist
OpenCRVS comes with a built-in set of automatic email alerts, that capture a minimal set of critical limits & conditions necessary for the product to work. When these alerts are triggered, the issues need to be solved as soon as possible. Oftentimes it's better if these issues are solved and planned for even before reaching the critical limits.
It's a good practice to monitor your production installation's infrastructure manually on a daily basis. This practice improves the reliability of your environment and gives your team a chance to include server improvements in planned work. The following list captures some of the essential values that should be followed manually
Disk space usage on all nodes is less than 70%
Login to Kibana
Navigate to Observability -> Metrics -> Metrics Explorer
Use the parameters listed in 7.2 Infrastructure health under Available disk space
Verify all used disk space is under 70% on all nodes
CPU and memory usage less than 80% on all nodes
Login to Kibana
Navigate to Observability -> Metrics -> Metrics Explorer
Use the parameters listed in 7.2 Infrastructure health under CPU usage
Select a timeframe of 24 hours
Verify CPU load has not exceeded 80% on any of your server nodes
No errors in any services (Observability -> APM -> Services -> [service] -> Errors)
Last updated