Story #3470
Updated by Chris Jones almost 12 years ago
The Coordinating Node environments heavily rely on consistent network communication, especially in regard to the Hazelcast cluster. Our services can get out of sync if there's a partitioned network where one or more cluster members drop from the cluster. We need to be able to monitor a few key states, with cluster membership being the most important thus far. We need to enable operational alerts through Nagios monitoring when the cluster gets partitioned. We need to programmatically respond to partitioned clusters (read only mode?), and we need to develop a custom merge policy when the cluster comes back into communication such that set, map, and queue entries that are out of sync get back into sync in terms of both number and content.
See: Se: http://epad.dataone.org/ClusterPartitionDiscussions
See: Se: http://epad.dataone.org/ClusterPartitionDiscussions