Project

General

Profile

Story #3056

Add instrumentation to the CNs for realtime monitoring

Added by Dave Vieglais almost 12 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
d1_monitor
Target version:
Start date:
2012-10-05
Due date:
2013-09-07
% Done:

100%

Story Points:
Sprint:

Description

Diagnostics on the CNs requires a lot of log grepping, and watching balances between processes is for the most part quite challenging.

The goal of this story is to implement lightweight, configurable instrumentation options to key processes operating on the coordinating nodes so that the state of various services can be visualized in near real time. This is different to the capability offered Nagios which will continue to offer monitoring and alert services.

Instrumentation will take the form of Ganglia [1] for the real time recording and history display, and JMXetric [2] for instrumentation. Ganglia will be run on monitor.dataone.org.

JMXetric offers several choices for instrumentation that should be easily integrated into our current environment without being invasive. When properly setup, annotations can be used to indicate methods that should be timed for example.

[1] http://ganglia.sourceforge.net/
[2] http://code.google.com/p/jmxetric/


Subtasks

Task #3299: Add statsd start/stop to init.d and set to start on bootClosedDave Vieglais

Task #3300: Add apache configuration for graphite serviceClosedDave Vieglais

Task #3302: Add new domain name "statsd.dataone.org"ClosedChris Jones

Task #3303: Create a nagios entry for statsd.dataone.orgClosedDave Vieglais

Task #3329: Document use of statsd and graphiteClosedDave Vieglais

Task #3301: Restrict statsd access to expected hosts onlyClosedDave Vieglais

History

#1 Updated by Dave Vieglais over 11 years ago

  • Position set to 1
  • Target version changed from Sprint-2012.33-Block.5.1 to Sprint-2012.37-Block.5.3

#2 Updated by Dave Vieglais over 11 years ago

  • Due date set to 2012-10-27
  • Target version changed from Sprint-2012.37-Block.5.3 to Sprint-2012.41-Block.6.1
  • translation missing: en.field_remaining_hours set to 0.0

#3 Updated by Dave Vieglais over 11 years ago

  • Status changed from New to In Progress

Using StatsD and Graphite, installing as new VM running on the KU Host hardware.

ip is 129.237.201.114

Sending new metrics for reporting is very simple using UDP. Example in bash:

#!/bin/bash
host="${STATSD_HOST:-129.237.201.114}"
port="${STATSD_PORT:-8125}"

if [ $# -ne 1 ]
then
echo "Syntax: $0 ''"
exit 1
fi

Setup UDP socket with statsd server

exec 3<> /dev/udp/$host/$port

Send data

printf "$1" >&3

Close UDP socket

exec 3<&-
exec 3>&-

#4 Updated by Chris Jones over 11 years ago

  • Target version changed from Sprint-2012.41-Block.6.1 to Sprint-2012.50-Block.6.4
  • Due date changed from 2012-10-27 to 2013-01-05

#5 Updated by Dave Vieglais over 11 years ago

  • Due date changed from 2013-01-05 to 2013-01-19
  • Target version changed from Sprint-2012.50-Block.6.4 to 2013.2-Block.1.1

#6 Updated by Chris Jones over 11 years ago

Housekeeping: Moving out of 1.1, into 1.1.1.

#7 Updated by Chris Jones over 11 years ago

  • Milestone changed from CCI-1.1 to CCI-1.1.1

#8 Updated by Chris Jones about 11 years ago

  • Due date changed from 2013-01-19 to 2013-03-16
  • Target version changed from 2013.2-Block.1.1 to 2013.10-Block.2.1

#9 Updated by Dave Vieglais over 10 years ago

  • Target version changed from 2013.10-Block.2.1 to 2013.35-Block.5.1
  • Due date changed from 2013-03-16 to 2013-09-07

#10 Updated by Dave Vieglais over 10 years ago

  • Product Version set to *
  • Status changed from In Progress to Closed

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)