Story #8639: Replication performance is too slow to service demand
Replication tasks apparently not deleted from backing store in a timely fashion
The replication task queue
implemented in Hazelcast uses postgres as a backing store to preserve state between restarts.
Actually, it looks like the queue is entirely in Postgres.
ReplicationManager holds an instance of
ReplicationTaskQueue, which in turn uses the
ReplicationDao interface, for which the implementation is
Observing event log messages and correlating with code where those events are emitted, it appears that tasks are not being deleted from the backing store.
ReplicationManager removeReplicationTasksForPid(); log message = "removing replication tasks for pid: ..." ReplicationTaskRepository taskRepository.delete(tasks)
Sequence of log messages for a single PID, starting with
[ INFO] 2018-07-04 13:19:21,996 [pool-6-thread-1] (ReplicationManager:removeReplicationTasksForPid:779) removing replication tasks for pid: ess-dive-112ed52c7689908-20180328T192607490 [ INFO] 2018-07-04 13:19:22,095 [pool-6-thread-1] (ReplicationManager:createAndQueueTasks:390) Added 0 MNReplicationTasks to the queue for ess-dive-112ed52c7689908-20180328T192607490 [ WARN] 2018-07-04 13:19:22,096 [pool-6-thread-1] (ReplicationManager:requeueReplicationTask:794) In Replication Manager, task that should exist 'in process' does not exist. Creating new task for pid: ess-dive-112ed52c7689908-20180328T192607490
ReplicationTaskRepository is an interface that extends
org.springframework.data.repository.PagingAndSortingRepository. An instance is created by
repositoryFactory.getReplicationTaskRepository() using the
ReplicationRepositoryFactory passed to the
The implementation of