https://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2012-08-03T01:14:46ZDataONE TasksInfrastructure - Task #3045: Reduce Hazelcast.init() startup time for populating hzIdentifiershttps://redmine.dataone.org/issues/3045?journal_id=131692012-08-03T01:14:46ZChris Jonescjones@nceas.ucsb.edu
<ul><li><strong>Parent task</strong> set to <i>#3116</i></li></ul> Infrastructure - Task #3045: Reduce Hazelcast.init() startup time for populating hzIdentifiershttps://redmine.dataone.org/issues/3045?journal_id=131702012-08-03T01:17:11ZChris Jonescjones@nceas.ucsb.edu
<ul><li><strong>Subject</strong> changed from <i>HazelcastService.init() takes hours to complete</i> to <i>Reduce Hazelcast.init() startup time for populating hzIdentifiers</i></li><li><strong>Category</strong> set to <i>Metacat</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li><li><strong>Target version</strong> set to <i>Sprint-2012.27-Block.4.2</i></li></ul> Infrastructure - Task #3045: Reduce Hazelcast.init() startup time for populating hzIdentifiershttps://redmine.dataone.org/issues/3045?journal_id=131742012-08-07T15:12:41ZBen Leinfelderleinfelder@nceas.ucsb.edu
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Closed</i></li></ul><p>The current approach we are taking to alleviate this load is to process the differences between the shared hzIdentifiers set and the list of identifiers we have locally (for each server).<br>
Here's a summary of the process, assuming 3 CNs are starting for the first time.<br>
1. server A starts and sees an empty hzIdentifiers set so contributes all local ids to the shared set. This is quick because no other nodes need to be consulted and no migration events are taking place.<br>
2. server B starts and sees that there are many identifiers already in the shared hzIdentifiers set and so performs a diff of those that are already shared and those that are not yet shared. There are two possible outcomes here: ideally, server B will not have any additional pids to add, but if it does it will only add those missing pids thereby skipping superfluous calls to hzIdentifiers.contains(). In the process of finding the pids that are missing from the shared map, we are also keeping track of the pids that the server B does not have locally (this will be used by the systemMetadata resynchronization process later).<br>
3. server C starts and does the same thing that server B did, reaping the same efficiencies. </p>