Bug #7706
Hazelcast Runtime exception halts synchronization
100%
Description
the exception: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
is caused by at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
The exception occurs at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:116)
where it is not caught, and only is caught by threadManager SyncObjectTaskManager, ending the thread. Thus, synchronization fails.
The exception may be caused by SyncObject (the class that gets passed into the hazelcastSyncObjectQueue does not define a serialVersionUID).
Also, having the SyncObjectTaskManager should also disable ObjectListHarvestTask.
Currently, to deactivate Synchronization, the Synchronization.active property is set to false. However, we have no way of accomplishing that easily now. It may be best to
have a global 'disable' static class that evaluates the Synchronization.active property along with a static settable boolean that can permanently disable sync until it is
ready to be restarted.
Related issues
Associated revisions
refs #7706
Hazelcast Runtime exception halts synchronization
refs #7706
Hazelcast Runtime exception halts synchronization
refs #7706
Hazelcast Runtime exception halts synchronization
refs #7706
Hazelcast Runtime exception halts synchronization
History
#1 Updated by Robert Waltz over 8 years ago
- Tracker changed from Task to Bug
#2 Updated by Robert Waltz over 8 years ago
- % Done changed from 0 to 30
- Category changed from d1_cn_common to d1_synchronization
- Status changed from New to In Progress
Handling Catastrophic failures of SyncObjectTask
If the SyncObjectTaskManager recieves an exception that disables the running of SyncObjectTask, then all of synchronization should be halted, and a notification Or log message sent regarding the issue.
The impact will be on quartz scheduling and any quartz jobs that are running.
Quartz jobs run the ObjectListHarvestTask. Any ObjectListHarvestTask job this is running when the exception happens should halt it's processing and return.
since quartz jobs are created by quartz scheduling, an observer pattern can not be applied, since we are not able to track the job instantiations in an observer class.
If SyncObjectTaskManager goes down, then HarvestSchedulingManager should shutdown all of its jobs, never to be rescheduled.
NodeTopicListener calls HarvestSchedulingManager.manageHarvest method when it recieves a message about an updated node.
In DataONE CN Common, there is a class named ComponentActiviationUtility that tracks the activation state of d1_processing components. syncrhonizationIsActive() is a method call that returns the evaluation of a private method sychronizationComponentActive(). It only returns the status of the property Synchronization.active. Add an additional AtomicBoolean set by SyncObjectTaskManager. Add a method to disbleSynchronization by setting the boolean to false, also conjunct Synchronization.active to the AtomicBoolen when calling syncrhonizationIsActive().
Update HarvestSchedulingManager.manageHarvest to review the scheduler.isShutdown state before rescheduling tasks. Create halt method that will shutdown the scheduler, waiting or any active jobs to complete.
SyncObjectTaskManager will need a reference to the HarvestSchedulingManager in order to call halt when exception occurs.
#3 Updated by Robert Waltz over 8 years ago
- Status changed from In Progress to Testing
- % Done changed from 30 to 50
#4 Updated by Robert Waltz over 8 years ago
- Status changed from Testing to Closed
- % Done changed from 50 to 100
#5 Updated by Rob Nahf over 6 years ago
- Related to Story #8525: timeout exceptions thrown from Hazelcast disable synchronization added