Bug #7602
Re-evaluation of NodeList blocks re-harvesting
100%
Description
LogAggregation hangs during re-evaluation of the nodelist for harvest scheduling adjustments.
LogAggregation checks for new nodes or deleted nodes once a day. The scheduled job that performs the service hangs while attempting to determine if any jobs are executing. In order for changes to be made to the scheduler, the scheduler will pause or place in standby all triggers for the job. After checking if all jobs are complete, then the scheduler will add/remove jobs and start up again. The problem is that during the pause, all jobs are never paused because the code for managing the harvest is itself a job.
The resolution will be to check that every job is halted except for the calling job.
Associated revisions
refs #7602
Modified LogAggregationScheduleManager to ignore the LogAggregationManageScheduleJob when re-evaluating MemberNode harvest scheduling. Switched to Log4j in most of the code for ease of debugging.Changed the name of LogAggregationHarvestJob to LogAggregationManageScheduleJob as a more accurate description of the purpose of the class.
refs #7602
Modified LogAggregationScheduleManager to ignore the LogAggregationManageScheduleJob when re-evaluating MemberNode harvest scheduling. Switched to Log4j in most of the code for ease of debugging.Changed the name of LogAggregationHarvestJob to LogAggregationManageScheduleJob as a more accurate description of the purpose of the class.
refs #7602
Modified LogAggregationScheduleManager to ignore the LogAggregationManageScheduleJob when re-evaluating MemberNode harvest scheduling. Switched to Log4j in most of the code for ease of debugging.Changed the name of LogAggregationHarvestJob to LogAggregationManageScheduleJob as a more accurate description of the purpose of the class.
refs #7602
Modified LogAggregationScheduleManager to ignore the LogAggregationManageScheduleJob when re-evaluating MemberNode harvest scheduling. Switched to Log4j in most of the code for ease of debugging.Changed the name of LogAggregationHarvestJob to LogAggregationManageScheduleJob as a more accurate description of the purpose of the class.
refs #7602
Modified LogAggregationScheduleManager to increase the timeframe when the ManageScheduleJob will run initially.
refs #7602
Modified LogAggregationScheduleManager to increase the timeframe when the ManageScheduleJob will run initially.
refs #7602
Modified LogAggregationScheduleManager removed the pause for due to hazelcast (hazelcast is no longer a factor in job scheduling) before triggering jobs.
Modified LogAggregationScheduleManager to shortcircuit rescheduling if there are no jobs to be scheduled or deleted.
Modified LogAggregationScheduleManager to only consider nodes whose state is UP when performing collection math to determine jobs to schedule or delete.
refs #7602
Modified LogAggregationScheduleManager removed the pause for due to hazelcast (hazelcast is no longer a factor in job scheduling) before triggering jobs.
Modified LogAggregationScheduleManager to shortcircuit rescheduling if there are no jobs to be scheduled or deleted.
Modified LogAggregationScheduleManager to only consider nodes whose state is UP when performing collection math to determine jobs to schedule or delete.
refs #7602
decrease the delay offset limit to 5 minutes
refs #7602
decrease the delay offset limit to 5 minutes
refs #7602
Modified LogHarvesterTask to fix an annoying log message.
refs #7602
Modified LogHarvesterTask to fix an annoying log message.
fixes #7602: Re-evaluation of NodeList blocks re-harvesting
fixes #7602: Re-evaluation of NodeList blocks re-harvesting
refs #7602: Re-evaluation of NodeList blocks re-harvesting
refs #7602: Re-evaluation of NodeList blocks re-harvesting
refs #7602: Re-evaluation of NodeList blocks re-harvesting of log aggregation
refs #7602: Re-evaluation of NodeList blocks re-harvesting of log aggregation
History
#1 Updated by Robert Waltz almost 9 years ago
- Status changed from In Progress to Testing
- % Done changed from 30 to 50
#2 Updated by Robert Waltz almost 9 years ago
- Status changed from Testing to Closed
- % Done changed from 50 to 100
Applied in changeset d1-python:d1_python|r17431.