ResourceMapFactory.parseResourceMap does not return aggregated resources
During testing an ORE that only defines aggregation relationships:
It appears that resources that are only 'aggregated' by the resource map - and not 'documentedBy' or 'document' another resource - do not get returned in the data structure created by the 'parseResourceMap' method.
Since this method is an encapsulation of org.dspace.forsite.ResourceMap - to allow an interface to hide the parsing method (XPath vs forsite) - it seems it should return all valid identifiers/relationships found in the ORE document.
#3 Updated by Rob Nahf about 8 years ago
- Status changed from New to In Progress
- Tracker changed from Bug to Feature
- Due date changed from 2014-09-25 to 2014-09-26
- Target version changed from CCI-1.5.0 to 329
- % Done changed from 0 to 20
- Product Version changed from * to 1.4.0
this goes beyond the implementation capabilities, so I'm changing this to a feature request.
parseResourceMap was written to extract the metadata and data objects from a ResourceMap (serialized) in order to build a DataPackage object. Keys of the returned map are (generally) METADATA objects, defined in the ResourceMap as Subject of the cito:documents triple (and object of the cito:documentedBy triple), while values of the map are (generally) lists of DATA objects, defined by the inverse relationship triples.
The situation that raised this issue is a ResourceMap that aggregates 3 objects that represent 3 different formats of the same METADATA (probably with minor variations). Without the cito:documents/isDocumentedBy relationships, there isn't a way to know whether an object is a METADATA or DATA object, and thus whether it should be a key or value of the returned map.
A couple possible solutions are possible, both requiring incrementing the minor version:
1. define a separate method for retrieving resource map aggregated resources - it would not know how to differentiate between METADATA and DATA. The advantage of this is clarity, the downside is need to manage ResourceMap objects in memory (which can be much larger than their serialized form)
- Return objects not participating in the cito:documents/documentedBy relationships as keys in the existing map structure of parseResourceMap. The advantage of this is that the application layer doesn't have to manage ResourceMap instances - the serialized form can be completely parsed into a relatively small Map of Identifiers.
For retrieving all relationships, it might make more sense to work with the foresite.ResourceMap object directly:
ResourceMap resMap = ResourceMapFactory.deserialize(InputStream is);
List trips = resMap.listAllTriples();