Story #3434
Investigate Timeout Problems on Staging Hazelcast
100%
Description
We have noted on upgrade to 2.4.1 hazelcast an increase in the number of reported timeouts from HZ members to the master node of the hz cluster (the oldest member of the cluster) on staging.
Determine what effects the exceptions may have based on timeout scenarios.
Determine if any simple modifications might alleviate some of the timeouts.
Manipulation of two variables had no effect on the performance of Staging machines, and I was unable to simulate split brain on the environment.
I modified two variables and ran a single threaded script that would evict objects, get them and then put them. I had all daemons running while executing the script.
The first variable modified is hazelcast.max.no.heartbeat.seconds, Max timeout of heartbeat in seconds for a node to assume it is dead, with a default setting of 300. In the uploaded files, I modified the variable to be 60 seconds in the zip files containing the string 60SecTO.
The second variable modified is hazelcast.map.partition.count, Distributed map partition count, with a default setting of 271. In the uploaded files, the results of modifigyging the variable to be 2710 are found in files containing the strings Partions & 2710.
I first wanted to test if a lower heartbeat timeout setting would affect the number of timeouts. It did not.
I also wished to monitor the performance of increasing the partition count. I was curious if the amount of traffic over TCP would decrease if the datastructures were split among more partitions. I did not find any significant difference with the exception that increasing partition size increases migration time.
Subtasks
History
#1 Updated by Robert Waltz almost 12 years ago
- Description updated (diff)
#2 Updated by Robert Waltz almost 12 years ago
- File StageUnmBaseline.zip added
- File StageOrcBaseline.zip added
- File StageUnmPartitions2710.zip added
- File StageOrcPartitions2710.zip added
- File StageOrcBaseline60SecTO.zip added
- File StageUnmPartitions60SecTO2710.zip added
- File StageOrcPartitions60SecTO2710.zip added
- File StageUnmBaseline60SecTO.zip added
#3 Updated by Robert Waltz almost 12 years ago
- Status changed from New to Closed