Bug #8126
Synchronization on cn-ucsb-1 stopped
100%
Description
Noted that sync logs were not being updated with success information. Checking cn-syncrhonization.log:
[ INFO] 2017-07-07 17:58:07,285 (V2TransferObjectTask:notifyReplicaNodes:1293) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - Entering notifyReplicaNodes...
[DEBUG] 2017-07-07 17:58:07,409 (ObjectListHarvestTask:call:221) placed on hzSyncObjectQueue- Task-urn:node:ARCTIC-urn:uuid:3b72fb12-5a25-43ad-a95f-6fbcd3d03151
[ INFO] 2017-07-07 17:58:07,410 (V2TransferObjectTask:processUpdates:1006) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - Completed ProcessUpdate
[ERROR] 2017-07-07 17:58:07,463 (V2TransferObjectTask:call:291) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - Current thread is not owner of the lock! - RuntimeException - Current thread is not owner of the lock!
java.lang.RuntimeException: Current thread is not owner of the lock!
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.LockClientProxy.unlock(LockClientProxy.java:68)
at org.dataone.cn.batch.synchronization.tasks.V2TransferObjectTask.call(V2TransferObjectTask.java:233)
at org.dataone.cn.batch.synchronization.tasks.V2TransferObjectTask.call(V2TransferObjectTask.java:113)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[ INFO] 2017-07-07 17:58:07,464 (V2TransferObjectTask:call:294) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - exiting with callState: FAILED
Then finally:
[DEBUG] 2017-07-07 17:58:11,847 (ObjectListHarvestTask:call:221) placed on hzSyncObjectQueue- Task-urn:node:ARCTIC-urn:uuid:f5977766-9b9e-4007-9552-f40ff863c713
[DEBUG] 2017-07-07 17:58:11,951 (ObjectListHarvestTask:call:221) placed on hzSyncObjectQueue- Task-urn:node:ARCTIC-urn:uuid:f766fc31-4334-4b30-9b76-097817a8dfed
[ERROR] 2017-07-07 17:58:11,952 (MemberNodeHarvestJob:execute:79) urn:node:ARCTIC - job-urn:node:ARCTIC died: urn:node:ARCTIC- Disabled
org.dataone.cn.batch.exceptions.ExecutionDisabledException: urn:node:ARCTIC- Disabled
at org.dataone.cn.batch.synchronization.tasks.ObjectListHarvestTask.call(ObjectListHarvestTask.java:176)
at org.dataone.cn.batch.synchronization.jobs.MemberNodeHarvestJob.execute(MemberNodeHarvestJob.java:64)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)
Sync appeared to proceed normally after a restart of d1-processing.
Associated revisions
refs #8126: added thread name and context (context usually blank) to logging format to benefit synchronization and hazelcast troubleshooting (lock issues)
refs #8126: added thread name and context (context usually blank) to logging format to benefit synchronization and hazelcast troubleshooting (lock issues)
History
#1 Updated by Dave Vieglais over 7 years ago
- Tracker changed from Task to Bug
#2 Updated by Rob Nahf almost 7 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
this seems to happen on occassion, and so the question is how? Added thread name and context to the logging to help diagnose in the future. Too late to look at the logs to find the root cause for this now...
#3 Updated by Dave Vieglais almost 7 years ago
- Sprint set to CCI-2.3.7