Project

General

Profile

Story #2622

d1_replication should prioritize MN replication tasks based on load, failures, and bandwidth factors

Added by Chris Jones about 12 years ago. Updated almost 12 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
d1_replication
Start date:
2012-04-20
Due date:
% Done:

100%

Story Points:
Sprint:

Description

The current implementation of ReplicationManager evaluates ReplicationPolicies for objects and only prioritizes target member nodes based on the preferred and blocked lists. Otherwise, all MNs have been treated as equals. We've seen performance problems on single MNs that tend to cause performance problems on the CNs when sending MNReplicationTasks to the ExecutorService. Task execution looks to slow down significantly when threads in the thread pool are held up by non-performant MNs.

To alleviate this, we need to more intelligently evaluate the capabilities of an MN as a target, and prioritize targets that are performant.

The strategy is to 1) Make MN implementations more resilient (i.e. queue replicate() requests), and 2) throttle requests to MNs based on a few different performance metrics. These are outlined at:

http://epad.dataone.org/20120420-replication-priority-queue

At first we will only throttle based on a limit of pending replication requests, but down the road will also evaluate the failure factor and the bandwidth factor.


Subtasks

Task #2623: Modify Metacat's MNodeService.replicate() to queue requestsClosedBen Leinfelder

Task #2624: Modify ReplicationManager.createAndQueueTasks() to limit replication tasks based on current MN loadClosedChris Jones

History

#1 Updated by Matthew Jones about 12 years ago

  • Tracker changed from Task to Story

#2 Updated by Dave Vieglais about 12 years ago

  • Position set to 1
  • Target version changed from Sprint-2012.15-Block.2.4 to Sprint-2012.17-Block.3.1

#3 Updated by Dave Vieglais almost 12 years ago

  • Position deleted (6)
  • Target version changed from Sprint-2012.17-Block.3.1 to Sprint-2012.19-Block.3.2
  • Position set to 2

#4 Updated by Chris Jones almost 12 years ago

  • Status changed from New to Closed

This is implemented except for the bandwidth factors. We'll need to decide how each MN is scored based on bandwidth (calculated or assigned).

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)