Project

General

Profile

Bug #8696

double indexing of a resource map and another not processed because of resource contention (lock) on member

Added by Rob Nahf about 3 years ago. Updated about 3 years ago.

Status:
New
Priority:
Urgent
Assignee:
Category:
d1_indexer
Target version:
Start date:
2018-09-12
Due date:
% Done:

0%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

In production, the ORE 'a1a0e96a-3cde-4f3c-829c-29650b09f22b' was not processed because a member was also referenced by the ORE it obsoleted, 'dc39515e-440b-4673-9f63-962c7374bf48'. The task failed without being requeued. Below is the log output.

rnahf@cn-orc-1:/var/log/dataone/index$ grep a1a0e96a-3cde-4f3c-829c-29650b09f22b cn-index-processor-daemon.log.*
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:27,384 (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task a1a0e96a-3cde-4f3c-829c-29650b09f22b
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:27,384 (IndexTaskProcessor:getNextIndexTask:610) Start of indexing pid: a1a0e96a-3cde-4f3c-829c-29650b09f22b
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:34,832 (IndexTaskProcessor:getNextIndexTask:664) the original index task - IndexTask [id=18085996, pid=a1a0e96a-3cde-4f3c-829c-29650b09f22b, formatid=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2018091015425874434.1, dateSysMetaModified=1536087134490, deleted=false, taskModifiedDate=1536619467383, priority=3, status=IN PROCESS]
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:34,832 (IndexTaskProcessor:getNextIndexTask:671) the new index task - IndexTask [id=18085996, pid=a1a0e96a-3cde-4f3c-829c-29650b09f22b, formatid=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2018091015425874434.1, dateSysMetaModified=1536087134490, deleted=false, taskModifiedDate=1536619467383, priority=3, status=IN PROCESS]
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:34,901 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:35,402 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:35,902 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:36,402 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:36,903 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:37,403 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:37,903 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:38,403 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:38,904 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:39,404 (IndexTaskProcessor:checkReadinessProcessResourceMap:369) ###################Another resource map is process the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 as well. So the thread to process id a1a0e96a-3cde-4f3c-829c-29650b09f22b has to wait 0.5 seconds.
cn-index-processor-daemon.log.6:[ERROR] 2018-09-10 22:44:39,904 (IndexTaskProcessor:checkReadinessProcessResourceMap:384) We waited for another thread to finish indexing a resource map which has the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 for a while. Now we quited and can't index id a1a0e96a-3cde-4f3c-829c-29650b09f22b
cn-index-processor-daemon.log.6:[ERROR] 2018-09-10 22:44:39,904 (IndexTaskProcessor:processTask:297) Unable to process task for pid: a1a0e96a-3cde-4f3c-829c-29650b09f22b
cn-index-processor-daemon.log.6:java.lang.Exception: We waited for another thread to finish indexing a resource map which has the referenced id ee73cf7f-1005-4b89-bab9-3a7fa01d27c6 for a while. Now we quited and can't index id a1a0e96a-3cde-4f3c-829c-29650b09f22b
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:39,906 (IndexTaskProcessor:newOrFailedIndexTaskExists:890) IndexTaskProcess.newOrFailedIndexTaskExists for id a1a0e96a-3cde-4f3c-829c-29650b09f22b
rnahf@cn-orc-1:/var/log/dataone/index$ date
Tue Sep 11 23:46:56 UTC 2018
rnahf@cn-orc-1:/var/log/dataone/index$ grep dc39515e-440b-4673-9f63-962c7374bf48 cn-index-processor-daemon.log.*
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:12,133 (HZEventFilter:filter:127) HZEventFilter.filter - the system metadata for the index event shows shows dc39515e-440b-4673-9f63-962c7374bf48 having a newer version than the SOLR server. So this event should be granted for indexing.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:13,347 (HZEventFilter:filter:127) HZEventFilter.filter - the system metadata for the index event shows shows dc39515e-440b-4673-9f63-962c7374bf48 having a newer version than the SOLR server. So this event should be granted for indexing.
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:18,677 (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task dc39515e-440b-4673-9f63-962c7374bf48
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:18,677 (IndexTaskProcessor:getNextIndexTask:610) Start of indexing pid: dc39515e-440b-4673-9f63-962c7374bf48
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:25,783 (IndexTaskProcessor:getNextIndexTask:664) the original index task - IndexTask [id=18086020, pid=dc39515e-440b-4673-9f63-962c7374bf48, formatid=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2017072514144216470.1, dateSysMetaModified=1536087137440, deleted=false, taskModifiedDate=1536619458675, priority=3, status=IN PROCESS]
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:25,783 (IndexTaskProcessor:getNextIndexTask:671) the new index task - IndexTask [id=18086020, pid=dc39515e-440b-4673-9f63-962c7374bf48, formatid=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2017072514144216470.1, dateSysMetaModified=1536087137440, deleted=false, taskModifiedDate=1536619458675, priority=3, status=IN PROCESS]
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:27,221 (IndexTaskProcessor:processTask:284) *********************start to process update index task for dc39515e-440b-4673-9f63-962c7374bf48 in thread 20
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:43,513 (IndexTaskProcessor:processTask:288) *********************end to process update index task for dc39515e-440b-4673-9f63-962c7374bf48 in thread 20
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:44:43,519 (IndexTaskProcessor:processTask:315) Indexing complete for pid: dc39515e-440b-4673-9f63-962c7374bf48
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:46:11,604 (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task dc39515e-440b-4673-9f63-962c7374bf48
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:46:11,604 (IndexTaskProcessor:getNextIndexTask:610) Start of indexing pid: dc39515e-440b-4673-9f63-962c7374bf48
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:46:18,731 (IndexTaskProcessor:getNextIndexTask:664) the original index task - IndexTask [id=18086015, pid=dc39515e-440b-4673-9f63-962c7374bf48, formatid=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2017072514144216470.1, dateSysMetaModified=1536087137440, deleted=false, taskModifiedDate=1536619571603, priority=3, status=IN PROCESS]
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:46:18,732 (IndexTaskProcessor:getNextIndexTask:671) the new index task - IndexTask [id=18086015, pid=dc39515e-440b-4673-9f63-962c7374bf48, formatid=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2017072514144216470.1, dateSysMetaModified=1536087137440, deleted=false, taskModifiedDate=1536619571603, priority=3, status=IN PROCESS]
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:46:20,164 (IndexTaskProcessor:processTask:284) *********************start to process update index task for dc39515e-440b-4673-9f63-962c7374bf48 in thread 20
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:46:36,252 (IndexTaskProcessor:processTask:288) *********************end to process update index task for dc39515e-440b-4673-9f63-962c7374bf48 in thread 20
cn-index-processor-daemon.log.6:[ INFO] 2018-09-10 22:46:36,255 (IndexTaskProcessor:processTask:315) Indexing complete for pid: dc39515e-440b-4673-9f63-962c7374bf48
cn-index-processor-daemon.log.7:[ INFO] 2018-09-10 21:44:09,798 (HZEventFilter:compareRaplicaList:256) HZEventFilter.compareReplicaList - the system metadata for the index event shows dc39515e-440b-4673-9f63-962c7374bf48 having the same replica list as the solr doc.
cn-index-processor-daemon.log.7:[ INFO] 2018-09-10 21:44:09,798 (HZEventFilter:filter:164) HZEventFilter.filter - the system metadata for the index event shows dc39515e-440b-4673-9f63-962c7374bf48 having the same modification date as the SOLR server. Also both have the same replica list. So this event has been filtered out for indexing (no indexing).
rnahf@cn-orc-1:/var/log/dataone/index$ 

Related issues

Related to Infrastructure - Story #8702: Indexing Refactor Strategy New 2018-09-24

History

#1 Updated by Rob Nahf about 3 years ago

  • Description updated (diff)

#2 Updated by Rob Nahf about 3 years ago

  • Description updated (diff)

#3 Updated by Rob Nahf about 3 years ago

  • Related to Story #8702: Indexing Refactor Strategy added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)