Project

General

Profile

Bug #8126

Synchronization on cn-ucsb-1 stopped

Added by Dave Vieglais almost 5 years ago. Updated over 4 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
d1_synchronization
Target version:
Start date:
2017-07-10
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

Noted that sync logs were not being updated with success information. Checking cn-syncrhonization.log:

[ INFO] 2017-07-07 17:58:07,285 (V2TransferObjectTask:notifyReplicaNodes:1293) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - Entering notifyReplicaNodes...
[DEBUG] 2017-07-07 17:58:07,409 (ObjectListHarvestTask:call:221) placed on hzSyncObjectQueue- Task-urn:node:ARCTIC-urn:uuid:3b72fb12-5a25-43ad-a95f-6fbcd3d03151
[ INFO] 2017-07-07 17:58:07,410 (V2TransferObjectTask:processUpdates:1006) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - Completed ProcessUpdate
[ERROR] 2017-07-07 17:58:07,463 (V2TransferObjectTask:call:291) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - Current thread is not owner of the lock! - RuntimeException - Current thread is not owner of the lock!
java.lang.RuntimeException: Current thread is not owner of the lock!
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.LockClientProxy.unlock(LockClientProxy.java:68)
at org.dataone.cn.batch.synchronization.tasks.V2TransferObjectTask.call(V2TransferObjectTask.java:233)
at org.dataone.cn.batch.synchronization.tasks.V2TransferObjectTask.call(V2TransferObjectTask.java:113)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
[ INFO] 2017-07-07 17:58:07,464 (V2TransferObjectTask:call:294) Task-urn:node:ARCTIC-urn:uuid:463ec0e7-fd06-4137-b379-1051e9f791df - exiting with callState: FAILED

Then finally:

[DEBUG] 2017-07-07 17:58:11,847 (ObjectListHarvestTask:call:221) placed on hzSyncObjectQueue- Task-urn:node:ARCTIC-urn:uuid:f5977766-9b9e-4007-9552-f40ff863c713
[DEBUG] 2017-07-07 17:58:11,951 (ObjectListHarvestTask:call:221) placed on hzSyncObjectQueue- Task-urn:node:ARCTIC-urn:uuid:f766fc31-4334-4b30-9b76-097817a8dfed
[ERROR] 2017-07-07 17:58:11,952 (MemberNodeHarvestJob:execute:79) urn:node:ARCTIC - job-urn:node:ARCTIC died: urn:node:ARCTIC- Disabled
org.dataone.cn.batch.exceptions.ExecutionDisabledException: urn:node:ARCTIC- Disabled
at org.dataone.cn.batch.synchronization.tasks.ObjectListHarvestTask.call(ObjectListHarvestTask.java:176)
at org.dataone.cn.batch.synchronization.jobs.MemberNodeHarvestJob.execute(MemberNodeHarvestJob.java:64)
at org.quartz.core.JobRunShell.run(JobRunShell.java:213)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:557)

Sync appeared to proceed normally after a restart of d1-processing.

Associated revisions

Revision 19046
Added by Rob Nahf over 4 years ago

refs #8126: added thread name and context (context usually blank) to logging format to benefit synchronization and hazelcast troubleshooting (lock issues)

Revision 19046
Added by Rob Nahf over 4 years ago

refs #8126: added thread name and context (context usually blank) to logging format to benefit synchronization and hazelcast troubleshooting (lock issues)

History

#1 Updated by Dave Vieglais almost 5 years ago

  • Tracker changed from Task to Bug

#2 Updated by Rob Nahf over 4 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

this seems to happen on occassion, and so the question is how? Added thread name and context to the logging to help diagnose in the future. Too late to look at the logs to find the root cause for this now...

#3 Updated by Dave Vieglais over 4 years ago

  • Sprint set to CCI-2.3.7

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)