Story #3059
Research potential HZ Bug
0%
Description
appears to be thrown when cn-orc-1, was the first system to be started in the cluster.
cn-unm-1 was restarted, waited until it was fully operational and then cn-ucsb-1 as restarted. After all three were operational,
cn-orc-1 was shutdown and left the cluster.
cn-unm-1 started to send these messages:
Jul 10, 2012 12:20:20 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [55] owner=Address[64.106.40.6:5701] migrationAddress=Address[128.111.36.80:5701] migration is not completed for 10 seconds.
Jul 10, 2012 12:20:49 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [81] owner=Address[64.106.40.6:5701] migrationAddress=Address[128.111.36.80:5701] migration is not completed for 10 seconds.
indicating partitioning problems between cn-unm-1 and cn-ucsb-1.
after 10 minutes when these messages stopped, cn-orc-1 was restarted.
cn-orc-1 reported these messages:
Jul 10, 2012 12:46:15 AM com.hazelcast.impl.ConcurrentMapManager
INFO: /160.36.13.150:5701 [DataONE] ======= -1: CONCURRENT_MAP_ADD_TO_SET ========
thisAddress= Address[160.36.13.150:5701], target= null
targetMember= null, targetConn=null, targetBlock=Block [39] owner=Address[128.111.36.80:5701] migrationAddress=Address[160.36.13.150:5701]
org.dataone.service.types.v1.Identifier@7caa5dd3 Re-doing [20] times! m:s:hzIdentifiers : null
and cn-unm-1 reported these messages:
Jul 10, 2012 12:41:28 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [30] owner=Address[128.111.36.80:5701] migrationAddress=Address[160.36.13.150:5701] migration is not completed for 10 seconds.
Jul 10, 2012 12:42:09 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [12] owner=Address[128.111.36.80:5701] migrationAddress=Address[160.36.13.150:5701] migration is not completed for 10 seconds.
History
#1 Updated by Dave Vieglais over 12 years ago
Possibly related, adding mostly to keep track of one potential trail to follow. https://github.com/hazelcast/hazelcast/issues/117
The issue was related to sensitivity of hazelcast to time offset between the servers.
#2 Updated by Dave Vieglais over 12 years ago
And another thread that suggests this message is informational, though unusually slow for something to take more than 10 secs: https://groups.google.com/forum/?fromgroups#!topic/hazelcast/54mAiG3PjTo
#3 Updated by Robert Waltz over 12 years ago
this is similar to what we saw last november with our client connection problems from the process daemons to the storage cluster
http://stackoverflow.com/questions/9997057/issue-on-start-up-with-hazelcast-concurrent-map-put
#4 Updated by Robert Waltz about 12 years ago
- Milestone changed from CCI-1.0.4 to None
- Target version set to Sprint-2012.39-Block.5.4
#5 Updated by Robert Waltz about 12 years ago
- Milestone changed from None to CCI-1.1
- Target version changed from Sprint-2012.39-Block.5.4 to Sprint-2012.41-Block.6.1
#6 Updated by Robert Waltz about 12 years ago
- Milestone changed from CCI-1.1 to CCI-1.2
#7 Updated by Robert Waltz about 12 years ago
- Due date set to 2012-10-27
- translation missing: en.field_remaining_hours set to 0.0
- Status changed from New to Rejected
This is not a bug. The resolution to this type of warning will most likely be solved by updating Hazelcast.