synchronization requeueing for temporary unavailability of nodeComms causes massive delays for package
In prod, where we are processing an initial sync of PANGAEA (cci-2.3.7), the 50k sync queue is causing massive delays for other nodes being able to synchronize their content. (50k task can take 2 days to process).
In this case, after waiting the 2 days, a 33-member package from RW made it to the head of the queue, but 3 items failed to sync due to lack of nodeComms, and were placed at the end of the sync queue. So, another 2 days.
A better retry mechanism is needed to eliminate double delays. (how would you feel, eh, if it happened to you?)
#3 Updated by Rob Nahf over 4 years ago
- % Done changed from 0 to 100
- Target version changed from CCI-2.3.10 to CCI-2.3.9
- Status changed from New to Closed
related to #8447, a priority queue for each MN was also defined, so that the item that gets requeued goes to the front of a much shorter queue.
This was deployed in 2.3.9.