Project

General

Profile

Task #3856

MNDeployment #3552: USGS CSAS

Re-harvest ORE documents from MN

Added by Skye Roseboom over 10 years ago. Updated almost 9 years ago.

Status:
Closed
Priority:
High
Assignee:
Robert Waltz
Target version:
-
Start date:
2013-06-28
Due date:
% Done:

100%

Estimated time:
0.00 h
Story Points:
Sprint:

Description

The content of the ORE documents from USGS CSAS have been changed 'in-place'.

Need to determine how to re-harvest the content to the CN.

Potentially same solution as being investigated for re-harvesting content from ORNL DAAC.

Directly related to issue 3839.

This content would be immediately available through the CN REST API and search index, if new resource maps were generated with new pids/system metadata which obsolete the original/edited versions.


Related issues

Related to Member Nodes - Task #3839: ORE documents contain references to non-existent science metadata pids Closed 2013-06-24

History

#1 Updated by Chris Jones over 10 years ago

  • Assignee changed from Chris Jones to Robert Waltz
  • Priority changed from Normal to High
  • Status changed from New to In Progress

I'm assigning this to Robert for now, since he's working on the code for a one-time to update the CNs that will allow us to reharvest USGSCSAS content this week before the DUG meeting.

#2 Updated by Robert Waltz over 10 years ago

  • Status changed from In Progress to Closed
  • translation missing: en.field_remaining_hours set to 0.0

#4 Updated by Skye Roseboom over 10 years ago

Hi Robert,

Looking at the indexing output regarding these re-harvested pids -- I am seeing a lot of object path errors. In the index processing log it looks like this:

[ INFO] 2013-07-11 14:45:11,573 (IndexTaskProcessor:isObjectPathReady:262) Object path exists for pid: resourceMap_doi_10.5066_F77H1GHV.xml however the file location: /var/metacat/data/autogen.2013042612461679446.1 does not exist. Marking not ready - task will be marked new and retried.

[ INFO] 2013-07-11 14:45:11,606 (IndexTaskProcessor:isObjectPathReady:262) Object path exists for pid: resourceMap_doi_10.5066_F7WW7FN6.xml however the file location: /var/metacat/data/autogen.2013042612464924180.1 does not exist. Marking not ready - task will be marked new and retried.

[ INFO] 2013-07-11 14:45:11,640 (IndexTaskProcessor:isObjectPathReady:262) Object path exists for pid: resourceMap_doi_10.5066_F7NZ85MB.xml however the file location: /var/metacat/data/autogen.2013042612465153383.1 does not exist. Marking not ready - task will be marked new and retried.

This indicates that the index processing process is attempting to read the contents of the ORE document off the local hard disk at the file path location indicated by the shared hazelcast data structure 'objectPath'. This is the structure in the storage cluster that maps PIDS to file system paths. Indexing does this in order to parse the contents of the ORE document - to derive the information contained by the ORE for the index record. Without a valid object path (file system path), indexing is unable to process the ORE documents. This is the reason these updated documents have not appeared updated in the index.

#5 Updated by Robert Waltz over 10 years ago

  • Status changed from In Progress to Closed

added logic in repair scripts to touch the hzObjectPath map when evicting pids.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)