Project

General

Profile

Story #3375

Hazelcast should be upgraded to version 2.X for stability

Added by Chris Jones almost 12 years ago. Updated over 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
performance-scalability
Start date:
2012-11-08
Due date:
2013-01-05
% Done:

100%

Story Points:
Sprint:

Description

We've seen a number of issues in each environment where Metacat's PostgreSQL system metadata tables don't stay in sync, replication suffers from inconsistent replica statuses across CNs, and the hzIdentifiers set iterator inconsistently won't iterate through the entire Set. One major factor may be Hazelcast cluster instance communication problems, showing up in the catalina.out logs as:

WARNING: /64.106.40.7:5701 [DataONE] hz.1.InThread Closing socket to endpoint Address[160.36.13.152:5701], Cause:java.io.EOFException

These are known issues in the Hazelcast forum and issue list for 1.9.X, and the recommended fix is to upgrade to Hazelcast 2.X, where the connection framework has been significantly rewritten. This story documents the components that need to be modified to handle the 2.X API changes.

The plan is to use the Hazelcast 2.4.x series, however there is an outstanding HazelcastClient connection bug (see https://github.com/hazelcast/hazelcast/issues/315) that affects all versions of Hazelcast from 1.9.3 to 2.4. It is fixed in 2.4.1, which has not been released yet. The plan is to use Hudson to build Hazelcast 2.4.1 from the TAG, use this build to refactor the code, and then eventually use the 2.4.1 release from the Hazelcast group once they get it pushed into Maven Central.


Subtasks

Task #3376: Upgrade d1_synchronization to Hazelcast 2.xClosedRobert Waltz

Task #3377: Upgrade d1_index to Hazelcast 2.xClosedSkye Roseboom

Task #3378: Upgrade d1_replication to Hazelcast 2.xClosedSkye Roseboom

Task #3379: Upgrade metacat to Hazelcast 2.xClosedChris Jones

Task #3380: Upgrade d1_client_hzpeek to Hazelcast 2.xNewDave Vieglais

Task #3382: Upgrade d1_cn_common to Hazelcast 2.xClosedSkye Roseboom

Task #3383: Upgrade d1_node_registry to Hazelcast 2.xClosedRobert Waltz

Task #3384: Upgrade d1_log_aggregation to Hazelcast 2.xClosedRobert Waltz

Task #3385: Upgrade d1_process_daemon to Hazelcast 2.xClosedRobert Waltz

Task #3386: Add a Hazelcast build to HudsonClosedRobert Waltz

Task #3393: Upgrade d1_identity_manger to Hazelcast 2.xRejectedBen Leinfelder

Task #3381: Upgrade replicationstatusmonitor to Hazelcast 2.xClosedDave Vieglais

History

#1 Updated by Chris Jones almost 12 years ago

  • Description updated (diff)

#2 Updated by Chris Jones almost 12 years ago

  • Description updated (diff)

#3 Updated by Chris Jones almost 12 years ago

  • Status changed from New to In Progress

Most comoponents have been upgraded and are in testing.

#4 Updated by Chris Jones over 11 years ago

  • Due date changed from 2012-11-10 to 2013-01-05
  • Target version changed from Sprint-2012.44-Block.6.2 to Sprint-2012.50-Block.6.4

#5 Updated by Chris Jones over 11 years ago

  • Status changed from In Progress to Closed

We've upgraded the CN stack to 2.4.1, and have tested in the dev, sandbox, and stage environments using the 1.1 branch, with no newly introduced bugs. The network partitioning behavior is not resolved, however. Closing this ticket since the upgrade is complete.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)