Project

General

Profile

Bug #8642

Story #8639: Replication performance is too slow to service demand

Replication tasks apparently not deleted from backing store in a timely fashion

Added by Dave Vieglais over 3 years ago. Updated over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
d1_replication
Target version:
-
Start date:
2018-07-04
Due date:
% Done:

0%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

The replication task queue implemented in Hazelcast uses postgres as a backing store to preserve state between restarts.

Actually, it looks like the queue is entirely in Postgres. ReplicationManager holds an instance of ReplicationTaskQueue, which in turn uses the ReplicationDao interface, for which the implementation is ReplicationDaoMetacatImpl.

Observing event log messages and correlating with code where those events are emitted, it appears that tasks are not being deleted from the backing store.

ReplicationManager

removeReplicationTasksForPid(); log message = "removing replication tasks for pid: ..." 
  ReplicationTaskRepository taskRepository.delete(tasks)

Sequence of log messages for a single PID, starting with removeReplicationTasksForPid():

[ INFO] 2018-07-04 13:19:21,996 [pool-6-thread-1]  (ReplicationManager:removeReplicationTasksForPid:779) removing replication tasks for pid: ess-dive-112ed52c7689908-20180328T192607490

[ INFO] 2018-07-04 13:19:22,095 [pool-6-thread-1]  (ReplicationManager:createAndQueueTasks:390) Added 0 MNReplicationTasks to the queue for ess-dive-112ed52c7689908-20180328T192607490

[ WARN] 2018-07-04 13:19:22,096 [pool-6-thread-1]  (ReplicationManager:requeueReplicationTask:794) In Replication Manager, task that should exist 'in process' does not exist.  Creating new task for pid: ess-dive-112ed52c7689908-20180328T192607490

ReplicationTaskRepository is an interface that extends org.springframework.data.repository.PagingAndSortingRepository. An instance is created by repositoryFactory.getReplicationTaskRepository() using the ReplicationRepositoryFactory passed to the ReplicationManager constructor.

The implementation of ReplicationRepositoryFactory is ReplicationPostgresRepositoryFactory

History

#1 Updated by Dave Vieglais over 3 years ago

  • Description updated (diff)

#2 Updated by Dave Vieglais over 3 years ago

  • Description updated (diff)

The sequence of log messages seems to imply that delete() does work, however logic elsewhere in ReplicationManager seems to recreate the deleted task.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)