Bug #8642
Story #8639: Replication performance is too slow to service demand
Replication tasks apparently not deleted from backing store in a timely fashion
0%
Description
The replication task queue implemented in Hazelcast uses postgres as a backing store to preserve state between restarts.
Actually, it looks like the queue is entirely in Postgres. ReplicationManager
holds an instance of ReplicationTaskQueue
, which in turn uses the ReplicationDao
interface, for which the implementation is ReplicationDaoMetacatImpl
.
Observing event log messages and correlating with code where those events are emitted, it appears that tasks are not being deleted from the backing store.
ReplicationManager removeReplicationTasksForPid(); log message = "removing replication tasks for pid: ..." ReplicationTaskRepository taskRepository.delete(tasks)
Sequence of log messages for a single PID, starting with removeReplicationTasksForPid()
:
[ INFO] 2018-07-04 13:19:21,996 [pool-6-thread-1] (ReplicationManager:removeReplicationTasksForPid:779) removing replication tasks for pid: ess-dive-112ed52c7689908-20180328T192607490 [ INFO] 2018-07-04 13:19:22,095 [pool-6-thread-1] (ReplicationManager:createAndQueueTasks:390) Added 0 MNReplicationTasks to the queue for ess-dive-112ed52c7689908-20180328T192607490 [ WARN] 2018-07-04 13:19:22,096 [pool-6-thread-1] (ReplicationManager:requeueReplicationTask:794) In Replication Manager, task that should exist 'in process' does not exist. Creating new task for pid: ess-dive-112ed52c7689908-20180328T192607490
ReplicationTaskRepository
is an interface that extends org.springframework.data.repository.PagingAndSortingRepository
. An instance is created by repositoryFactory.getReplicationTaskRepository()
using the ReplicationRepositoryFactory
passed to the ReplicationManager
constructor.
The implementation of ReplicationRepositoryFactory
is ReplicationPostgresRepositoryFactory
History
#1 Updated by Dave Vieglais over 6 years ago
- Description updated (diff)
#2 Updated by Dave Vieglais over 6 years ago
- Description updated (diff)
The sequence of log messages seems to imply that delete()
does work, however logic elsewhere in ReplicationManager
seems to recreate the deleted task.