Project

General

Profile

Feature #6719

Update data package relationship processing wrt maintaining SID based relationships.

Added by Skye Roseboom over 9 years ago. Updated almost 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Skye Roseboom
Category:
d1_cn_index_processor
Start date:
2015-01-06
Due date:
% Done:

100%

Story Points:
Sprint:

Description

SeriesId use in data packages:

1.) A ORE/data package defines relationships (documents, documentedBy) using seriesId to refer to data and science metadata documents.

Should data package relationships copied to each document in a series or just on the current end of the series?

After an ORE/data package/ which defines relationships using seriesID is indexed, a new version of a science metadata document in the series used in an ORE/data package - the seriesId based relationships need to be placed on the new document in the series. The series based relationships are removed from the previous documents in the series?

Same question/issues for annotations or provanance info? Can annotations be seriesId based, and if so are they placed on each document in the series?

Possible strategies:

Model 'aggregates' relationship on resource map records in search index.
Re-process all resource maps found on previous 'head of sid' chain when adding indexing document with sid value.

Issues:
Additions to SID chain will modify definition of data package after the ORE/data package document has been processed for index. Need to maintain current state of data packages.

History

#1 Updated by Skye Roseboom over 9 years ago

  • Parent task deleted (#6716)

#2 Updated by Skye Roseboom over 9 years ago

  • Tracker changed from Task to Feature
  • translation missing: en.field_release set to 2

#3 Updated by Skye Roseboom almost 9 years ago

  • Target version set to CCI-2.0-RC1

#4 Updated by Dave Vieglais almost 9 years ago

  • Target version changed from CCI-2.0-RC1 to CCI-2.0.0

#5 Updated by Dave Vieglais almost 9 years ago

  • Category set to d1_cn_index_processor

#6 Updated by Skye Roseboom almost 9 years ago

  • Description updated (diff)

#7 Updated by Ben Leinfelder almost 9 years ago

  • Status changed from New to Closed
  • % Done changed from 0 to 100

Now indexing OREs using both PIDs and SIDs. When objects (referenced by SID) in an ORE are updated, we submit the ORE for index processing again such that the package/documentation relationships are maintained for each revision of the object[s] in the sid chain[s].
See SolrIndexReprocessTest for details on our expectations.

#8 Updated by Ben Leinfelder almost 9 years ago

  • % Done changed from 100 to 30
  • Status changed from Closed to In Progress

Reopening as we rework the processing logic to only index the latest version of datapackge contents that use SIDs.

#9 Updated by Ben Leinfelder almost 9 years ago

Current behavior, as tested in unit test:
* Datapackage contains a mix of PID- and SID-identified objects.
* SciMeta is Identified with SID, then it is updated. Index then reflects that only the head-revision is in the datapackage and only the head revision "documents" the data objects in the package
* Data is identified with a SID, then it is update. Index then reflects that only the head-revision is in the datapackage and only the head revision shows "isDocumentedBy" pointing to the SciMeta identifier.

#10 Updated by Skye Roseboom almost 9 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 30 to 100

Ben finished work on this and updated unit tests. Closing issue.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)