Bug #4303

Fix potential bug where an object could be created and accepted for replication simultaneously.

Added by Roger Dahl about 10 years ago. Updated about 8 years ago.

Target version:
Start date:
Due date:
% Done:


Product Version:
Story Points:


Use SQL "select for update" (or similar) to lock the replication queue while a regular create takes place, so that a replication request cannot be accepted for an object at the same time as that object is being created. GMN already checks, in create(), that a replication request does not exist for the object. It also checks for the opposite, but there is a tiny window in which both could be created at the same time.


#1 Updated by Roger Dahl almost 10 years ago

This ticket is about improving handling for the very unlikely scenario of a CN calling MNReplication.replicate() with a given PID at the same time as a user calls MNStorage.create() for that same PID. The basic problem is that there are two separate tables holding the existing objects and the queue of replication requests and a given PID should never be in both tables at once. However, it's not possible to specify a unique constraint that covers two tables. GMN already checks, when processing a create(), that the PID is not already in the replication queue and it checks, when processing replicate(), that the PID is not already in the object table. The potential problem happens if the calls are processed concurrently, so that each call finds the PID not to exist in the opposite table and then goes ahead and inserts it.

The initial idea I had of using "select for update" is not going to work because "select for update" does not lock non-existing rows. Other ideas for resolving this revolve around creating a separate table for the value that should be unique (the PIDs in this table) and then using foreign key relationships:

As things are currently implemented, if the same PID does end up getting inserted into both tables, a failure will be detected when the async replication processing happens. At that time, the entry will be marked as failed but it will keep getting retried until someone manually removes it from the queue.

#2 Updated by Robert Waltz over 9 years ago

  • Assignee changed from Roger Dahl to Mark Servilla

#3 Updated by Robert Waltz over 9 years ago

  • Parent task deleted (#4299)

#4 Updated by Robert Waltz over 9 years ago

  • translation missing: en.field_remaining_hours set to 0.0
  • Tracker changed from Task to Bug
  • Due date set to 2014-10-01
  • Target version set to Release Backlog

#5 Updated by Mark Servilla over 9 years ago

  • Due date changed from 2014-10-01 to 2015-01-05
  • Assignee changed from Mark Servilla to Mark Flynn

#6 Updated by Roger Dahl about 8 years ago

  • Assignee changed from Mark Flynn to Roger Dahl

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)