Story #8726: Indexer flooded with failed resource map tasks from long ago when tomcat is restarted - Infrastructure - DataONE Tasks

Story #8726

Indexer flooded with failed resource map tasks from long ago when tomcat is restarted

Added by Rob Nahf over 6 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

Immediate

Assignee:

Jing Tao

Category:

d1_indexer

Target version:

CCI-2.3.10

Start date:

2018-10-03

Due date:

% Done:

100%

Story Points:

Sprint:

Infrastructure backlog

Description

The design of HZEventFilter is to filter out tasks for previously-indexed (and up-to-date) and archived objects, and was put in place mainly to avoid unnecessary indexing tasks being generated if the Hazelcast system metadata map has to be rehydrated (after tomcat restarts, and software upgrades).

However, it doesn't filter tasks for anything not in the index, which includes the many failed resource map indexing tasks, and these are time-consuming and plentiful. For example, we observed 2 days of constant index processing trying to index many Dryad index maps that failed because they have missing members.

A short term fix of defining a configurable lookback period beyond which objects older (by modification date) are not indexed if they are not present. This should greatly reduce the number of fruitless reprocessing of index failures. 1 month is the recommended length of the lookback period.

That means that if an object is not indexed, and its dateSystemMetadataModified is more than a month ago (or whatever is defined by configuration), filter out the PID, and don't generate a task for it.