Task #2177: Use Lock.tryLock() (not Lock.lock()) in d1_replication - Infrastructure - DataONE Tasks

Task #2177

Story #2166: Hazelcast cluster errors need to be isolated

Use Lock.tryLock() (not Lock.lock()) in d1_replication

Added by Chris Jones about 13 years ago. Updated almost 13 years ago.

Status:

Closed

Priority:

Normal

Assignee:

Chris Jones

Category:

d1_replication

Target version:

Sprint-2012.01-Block.1.1

Start date:

2012-01-09

Due date:

% Done:

100%

Milestone:

CCI-1.0.0

Product Version:

Story Points:

Sprint:

Description

In order to avoid potential deadlocks in Hazelcast, convert calls to Lock.lock() to Lock.tryLock() and introduce a timeout for the try. Also introduce a queue structure to re-process operations that need to be re-tried if the lock fails.

History

#1 Updated by Chris Jones about 13 years ago

Subject changed from Use Lock.tryLock() (not Lock.lock()) d1_replication to Use Lock.tryLock() (not Lock.lock()) in d1_replication

#2 Updated by Chris Jones about 13 years ago

Status changed from New to In Progress

#3 Updated by Chris Jones almost 13 years ago

Status changed from In Progress to Closed

I've created a ReplicationEventListener class that queues events for processing. entryAdded and entryUpdated events trigger a tryLock() on an event string based on the incoming identifier. Whichever CN instance gets the lock first will queue the identifier and createAndQueueTasks() will be called for it.

Also available in: Atom PDF

Project

General

Profile

Infrastructure

Issues