Project

General

Profile

Task #7655

Story #7652: Enable simple metrics reporting for core services

report on replication activity

Added by Dave Vieglais about 8 years ago. Updated almost 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
performance-scalability
Target version:
Start date:
2016-02-23
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

At a minimum, report on the size of the replication backlog.

Results should be written to a file that is easily parsed with various languages. Two common options include JSON and CSV. A CSV file would be convenient for appending entries if more than a single record is to be kept.

Contents of the file are to be determined but should include the size of the replication backlog, and may include additional per member node information such as the timestamp of the last replication activity and perhaps the number of current replication tasks for that MN.

Associated revisions

Revision 18057
Added by Rob Nahf almost 8 years ago

refs: #7655: added ReplicationTaskMonitor (runnable) to report replicationTask queue statistics (by status and authMN). Commented out unused 'public InputStream ReplicationService#getObjectFromCN() method.

Revision 18057
Added by Rob Nahf almost 8 years ago

refs: #7655: added ReplicationTaskMonitor (runnable) to report replicationTask queue statistics (by status and authMN). Commented out unused 'public InputStream ReplicationService#getObjectFromCN() method.

Revision 18058
Added by Rob Nahf almost 8 years ago

refs: #7655: added ReplicaStatusMonitor (runnable) to report summary counts of replicas by target node and status. Added Replication MetricEvents.

Revision 18058
Added by Rob Nahf almost 8 years ago

refs: #7655: added ReplicaStatusMonitor (runnable) to report summary counts of replicas by target node and status. Added Replication MetricEvents.

Revision 18069
Added by Rob Nahf almost 8 years ago

refs: #7655. parameterized replication monitoring frequency.

Revision 18069
Added by Rob Nahf almost 8 years ago

refs: #7655. parameterized replication monitoring frequency.

Revision 18074
Added by Robert Waltz almost 8 years ago

refs: #7655

parameterized replication monitoring frequency. fix config properties

Revision 18074
Added by Robert Waltz almost 8 years ago

refs: #7655

parameterized replication monitoring frequency. fix config properties

History

#1 Updated by Rob Nahf almost 8 years ago

replicationDAO has methods to determine how many outstanding requests there are per MN, and hooks into ReplicationManager for reporting. It should relatively straightforward to schedule a task that runs at regular intervals to generate monitoring statistics.

We may need to add more DAO queries to not filter by task state, or queries that return counts instead of records. (located in ReplicationDaoMetacatImpl in d1_cn_common.

#2 Updated by Rob Nahf almost 8 years ago

one pid can be associated with [0..n] replicas (identified by [pid, targetMN])

so, replication has two things can can be potentially backlogged:
1. the pid that has been picked up by the sysmeta map listener (the sysmeta has changed, so need to re-evaluate if enough replicas were created)
2. replica themselves that have been ordered to be created on a target node. (a request issued to a replica node, but not completed).

The first type could be reported per authoritativeMN.

The second by target node. The StaleReplicationRequestAuditor looks for these and tries to address them.

pid statuses:
NEW - the listener picked up a pid to evaluate
IN_PROCESS - a processor picked the pid up to evaluate
(there is no complete status. I believe the item is removed from the repo.)

replica statuses:
QUEUED, REQUESTED, COMPLETED, FAILED, INVALIDATED

#3 Updated by Robert Waltz almost 8 years ago

  • % Done changed from 0 to 30
  • Status changed from New to In Progress

#4 Updated by Robert Waltz almost 8 years ago

  • % Done changed from 30 to 50
  • Status changed from In Progress to Testing

#5 Updated by Robert Waltz almost 8 years ago

  • Status changed from Testing to Closed
  • translation missing: en.field_remaining_hours set to 0.0
  • % Done changed from 50 to 100

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)