Feature #6719
Update data package relationship processing wrt maintaining SID based relationships.
100%
Description
SeriesId use in data packages:
1.) A ORE/data package defines relationships (documents, documentedBy) using seriesId to refer to data and science metadata documents.
Should data package relationships copied to each document in a series or just on the current end of the series?
After an ORE/data package/ which defines relationships using seriesID is indexed, a new version of a science metadata document in the series used in an ORE/data package - the seriesId based relationships need to be placed on the new document in the series. The series based relationships are removed from the previous documents in the series?
Same question/issues for annotations or provanance info? Can annotations be seriesId based, and if so are they placed on each document in the series?
Possible strategies:
Model 'aggregates' relationship on resource map records in search index.
Re-process all resource maps found on previous 'head of sid' chain when adding indexing document with sid value.
Issues:
Additions to SID chain will modify definition of data package after the ORE/data package document has been processed for index. Need to maintain current state of data packages.
History
#1 Updated by Skye Roseboom almost 10 years ago
- Parent task deleted (
#6716)
#2 Updated by Skye Roseboom almost 10 years ago
- Tracker changed from Task to Feature
- translation missing: en.field_release set to 2
#3 Updated by Skye Roseboom over 9 years ago
- Target version set to CCI-2.0-RC1
#4 Updated by Dave Vieglais over 9 years ago
- Target version changed from CCI-2.0-RC1 to CCI-2.0.0
#5 Updated by Dave Vieglais over 9 years ago
- Category set to d1_cn_index_processor
#6 Updated by Skye Roseboom over 9 years ago
- Description updated (diff)
#7 Updated by Ben Leinfelder over 9 years ago
- Status changed from New to Closed
- % Done changed from 0 to 100
Now indexing OREs using both PIDs and SIDs. When objects (referenced by SID) in an ORE are updated, we submit the ORE for index processing again such that the package/documentation relationships are maintained for each revision of the object[s] in the sid chain[s].
See SolrIndexReprocessTest for details on our expectations.
#8 Updated by Ben Leinfelder over 9 years ago
- % Done changed from 100 to 30
- Status changed from Closed to In Progress
Reopening as we rework the processing logic to only index the latest version of datapackge contents that use SIDs.
#9 Updated by Ben Leinfelder over 9 years ago
Current behavior, as tested in unit test:
* Datapackage contains a mix of PID- and SID-identified objects.
* SciMeta is Identified with SID, then it is updated. Index then reflects that only the head-revision is in the datapackage and only the head revision "documents" the data objects in the package
* Data is identified with a SID, then it is update. Index then reflects that only the head-revision is in the datapackage and only the head revision shows "isDocumentedBy" pointing to the SciMeta identifier.
#10 Updated by Skye Roseboom over 9 years ago
- Status changed from In Progress to Closed
- % Done changed from 30 to 100
Ben finished work on this and updated unit tests. Closing issue.