Project

General

Profile

Story #3059

Research potential HZ Bug

Added by Robert Waltz almost 12 years ago. Updated over 11 years ago.

Status:
Rejected
Priority:
Normal
Assignee:
Robert Waltz
Category:
-
Start date:
2012-07-10
Due date:
2012-10-27
% Done:

0%

Story Points:
Sprint:

Description

appears to be thrown when cn-orc-1, was the first system to be started in the cluster.

cn-unm-1 was restarted, waited until it was fully operational and then cn-ucsb-1 as restarted. After all three were operational,
cn-orc-1 was shutdown and left the cluster.

cn-unm-1 started to send these messages:

Jul 10, 2012 12:20:20 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [55] owner=Address[64.106.40.6:5701] migrationAddress=Address[128.111.36.80:5701] migration is not completed for 10 seconds.
Jul 10, 2012 12:20:49 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [81] owner=Address[64.106.40.6:5701] migrationAddress=Address[128.111.36.80:5701] migration is not completed for 10 seconds.

indicating partitioning problems between cn-unm-1 and cn-ucsb-1.

after 10 minutes when these messages stopped, cn-orc-1 was restarted.

cn-orc-1 reported these messages:

Jul 10, 2012 12:46:15 AM com.hazelcast.impl.ConcurrentMapManager
INFO: /160.36.13.150:5701 [DataONE] ======= -1: CONCURRENT_MAP_ADD_TO_SET ========
thisAddress= Address[160.36.13.150:5701], target= null
targetMember= null, targetConn=null, targetBlock=Block [39] owner=Address[128.111.36.80:5701] migrationAddress=Address[160.36.13.150:5701]
org.dataone.service.types.v1.Identifier@7caa5dd3 Re-doing [20] times! m:s:hzIdentifiers : null

and cn-unm-1 reported these messages:

Jul 10, 2012 12:41:28 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [30] owner=Address[128.111.36.80:5701] migrationAddress=Address[160.36.13.150:5701] migration is not completed for 10 seconds.
Jul 10, 2012 12:42:09 AM com.hazelcast.impl.PartitionManager
WARNING: /64.106.40.6:5701 [DataONE] Block [12] owner=Address[128.111.36.80:5701] migrationAddress=Address[160.36.13.150:5701] migration is not completed for 10 seconds.

History

#1 Updated by Dave Vieglais almost 12 years ago

Possibly related, adding mostly to keep track of one potential trail to follow. https://github.com/hazelcast/hazelcast/issues/117

The issue was related to sensitivity of hazelcast to time offset between the servers.

#2 Updated by Dave Vieglais almost 12 years ago

And another thread that suggests this message is informational, though unusually slow for something to take more than 10 secs: https://groups.google.com/forum/?fromgroups#!topic/hazelcast/54mAiG3PjTo

#3 Updated by Robert Waltz almost 12 years ago

this is similar to what we saw last november with our client connection problems from the process daemons to the storage cluster

http://stackoverflow.com/questions/9997057/issue-on-start-up-with-hazelcast-concurrent-map-put

#4 Updated by Robert Waltz over 11 years ago

  • Milestone changed from CCI-1.0.4 to None
  • Target version set to Sprint-2012.39-Block.5.4

#5 Updated by Robert Waltz over 11 years ago

  • Milestone changed from None to CCI-1.1
  • Target version changed from Sprint-2012.39-Block.5.4 to Sprint-2012.41-Block.6.1

#6 Updated by Robert Waltz over 11 years ago

  • Milestone changed from CCI-1.1 to CCI-1.2

#7 Updated by Robert Waltz over 11 years ago

  • Due date set to 2012-10-27
  • translation missing: en.field_remaining_hours set to 0.0
  • Status changed from New to Rejected

This is not a bug. The resolution to this type of warning will most likely be solved by updating Hazelcast.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)