Task #175
Identify an infrastructure monitoring framework
100%
Description
It will soon be necessary to have a monitoring system in place that basically monitors resource use and system availability for the DataONE infrastructure. This would be helpful to identify system outages or loading on particular components (e.g. CNs running at max memory or MN bandwidth limited, etc).
The selected system should support a plugin architecture so that additional metrics can be added to the monitor (e.g. total data sets in system, or data sets per node).
A few candidates are:
- "nagios":http://www.nagios.org/
- "ganglia":http://ganglia.sourceforge.net/ (more suited to LAN configurations)
- "zabbix":http://www.zabbix.com
- "zenoss":http://community.zenoss.org
- "rrdtool":http://oss.oetiker.ch/rrdtool/ (generic timeseries recording and viewing - not a monitor)
- "cacti":http://www.cacti.net/
History
#1 Updated by Matthew Jones almost 15 years ago
We have Nagios set up at NCEAS for monitoring all of our servers -- it works well and allows custom monitoring scripts making it very flexible. Given our time constraints, I suggest we simply use that for now to minimize time to deployment -- the last thing we need is more tasks. -- Nick may already have it set up for cn-dev -- I'll check with him.