Need mechanism to delete content from CN that belongs to an MN
Need a mechanism to remove content from the CNs that belongs to a MN so that after a change in MN functionality (e.g. sysmeta generation), the erroneous / out of date content can be removed from the catalog. This would include removing from Metacat and ensuring that the content is re-indexed so that it doesn't show up in searches.
This may initially consist of a set of tasks that can be manually executed to clear out content from a MN, but eventually should be scripted so that an administrator can wipe MN content from a CN.
#6 Updated by Dave Vieglais over 10 years ago
- Assignee changed from Robert Waltz to Ben Leinfelder
- Milestone set to None
Requires some design work to layout the workflow for deleting an object from the CN and ensuring that all replicas are also removed.
The delete() method is inadequate for this - the content needs to be purged rather than tagged as archived or obsoleted.
#7 Updated by Ben Leinfelder over 10 years ago
There is an MN.delete() method with a note that we should determine what the semantics of this operation are.
For Metacat, "delete" does not remove any content and only prevents it from:
a) being updated by another revision,
b) having the same identifier reused, and
c) showing the object in search results.
If you know the identifier (cited in a paper, say) you can always retrieve it. In DataONE we would set SystemMetadata.archived=true for these items and the change in SystemMetadata should be replicated up to the CN and propagated to all replicas on other MNs.
Reasons for a more forceful "delete" mechanism:
a) Inappropriate content (illegal, copyrighted, too large)
Since we NEVER want to reuse identifiers, we should maintain a SystemMetadata record for all deleted objects. I would vote to change the SystemMetadata.archived flag to be an optional "status" indicator with initial possible values of "archived" and "deleted" where, if omitted, it would indicate a normal/active object. The MN should propagate this SystemMetadata change to the CN which would spread the word to the other MNs. I think the MNs could remove all trace of that object and rely on the CN to keep a record of the identifier being used (so that it was not reused in the DataONE system). MNs holding a replica could also completely remove the object. This points to a need for MNs to have two methods:
#8 Updated by Chris Jones about 10 years ago
- Target version set to Sprint-2012.17-Block.3.1
During our standup discussion on 04/23/2012, we decided to enable administrative delete() functionality by:
1) Renaming the current delete() method to archive(), and
2) Creating a new delete() method accessible only to administrative subjects
We had planned on changing the 'archived' flag in SystemMetadata to 'status' as Ben suggested, but a schema change is too late in the release cycle, and so we are keeping it the same.
The implementation of delete() needs to:
1) Remove the object from the CN (database and/or filesystem)
2) Mark the system metadata as 'archived' so it is not indexed
3) Iterate through the replica list and call MN.delete() for each replica
4) For each replica deletion, call MN.systemMetadataChanged() to update MN sysmeta
MN.delete() should likewise purge the object from the database/filesystem but keep the system metadata up-to-date with the CN copy.