Story #3056
Add instrumentation to the CNs for realtime monitoring
100%
Description
Diagnostics on the CNs requires a lot of log grepping, and watching balances between processes is for the most part quite challenging.
The goal of this story is to implement lightweight, configurable instrumentation options to key processes operating on the coordinating nodes so that the state of various services can be visualized in near real time. This is different to the capability offered Nagios which will continue to offer monitoring and alert services.
Instrumentation will take the form of Ganglia [1] for the real time recording and history display, and JMXetric [2] for instrumentation. Ganglia will be run on monitor.dataone.org.
JMXetric offers several choices for instrumentation that should be easily integrated into our current environment without being invasive. When properly setup, annotations can be used to indicate methods that should be timed for example.
[1] http://ganglia.sourceforge.net/
[2] http://code.google.com/p/jmxetric/
Subtasks
History
#1 Updated by Dave Vieglais about 12 years ago
- Position set to 1
- Target version changed from Sprint-2012.33-Block.5.1 to Sprint-2012.37-Block.5.3
#2 Updated by Dave Vieglais about 12 years ago
- Due date set to 2012-10-27
- Target version changed from Sprint-2012.37-Block.5.3 to Sprint-2012.41-Block.6.1
- translation missing: en.field_remaining_hours set to 0.0
#3 Updated by Dave Vieglais about 12 years ago
- Status changed from New to In Progress
Using StatsD and Graphite, installing as new VM running on the KU Host hardware.
ip is 129.237.201.114
Sending new metrics for reporting is very simple using UDP. Example in bash:
#!/bin/bash
host="${STATSD_HOST:-129.237.201.114}"
port="${STATSD_PORT:-8125}"
if [ $# -ne 1 ]
then
echo "Syntax: $0 ''"
exit 1
fi
Setup UDP socket with statsd server¶
exec 3<> /dev/udp/$host/$port
Send data¶
printf "$1" >&3
Close UDP socket¶
exec 3<&-
exec 3>&-
#4 Updated by Chris Jones almost 12 years ago
- Target version changed from Sprint-2012.41-Block.6.1 to Sprint-2012.50-Block.6.4
- Due date changed from 2012-10-27 to 2013-01-05
#5 Updated by Dave Vieglais almost 12 years ago
- Due date changed from 2013-01-05 to 2013-01-19
- Target version changed from Sprint-2012.50-Block.6.4 to 2013.2-Block.1.1
#6 Updated by Chris Jones almost 12 years ago
Housekeeping: Moving out of 1.1, into 1.1.1.
#7 Updated by Chris Jones almost 12 years ago
- Milestone changed from CCI-1.1 to CCI-1.1.1
#8 Updated by Chris Jones over 11 years ago
- Due date changed from 2013-01-19 to 2013-03-16
- Target version changed from 2013.2-Block.1.1 to 2013.10-Block.2.1
#9 Updated by Dave Vieglais over 11 years ago
- Target version changed from 2013.10-Block.2.1 to 2013.35-Block.5.1
- Due date changed from 2013-03-16 to 2013-09-07
#10 Updated by Dave Vieglais almost 11 years ago
- Product Version set to *
- Status changed from In Progress to Closed