Verify that Resource Maps are complete, correctly formatted and represent the intended associations.
#2 Updated by Chris Jones over 8 years ago
- Assignee set to Chris Jones
- Status changed from New to In Progress
I'm seeing a few issues with the MPC resource maps at https://dataone-test.pop.umn.edu/mn/v1/object?formatId=http://www.openarchives.org/ore/terms :
1) Some system metadata for resource maps have a formatId of 'application/octet-stream'. This needs to be changed to 'http://www.openarchives.org/ore/terms'. See https://dataone-test.pop.umn.edu/mn/v1/object/ipumsi_6-3_cr_1984.rm.xml as an example.
2) The serialized resource maps include an aggregation statement where the resource map itself is aggregated, which I don't think is correct. The aggregation should only include the science data and science metadata triple statements (or other types of metadata, like provenance, etc.).
3) The triple statements in the resource maps don't point to CN-resolvable URIs, which is a requirement for DataONE data packages. See http://mule1.dataone.org/ArchitectureDocs-current/design/DataPackage.html#generating-resource-maps . An example is in https://dataone-test.pop.umn.edu/mn/v1/object/ipumsi_6-3_cr_1984.rm.xml, where both the subjects and objects in the triple statements point to URIs like http://international.ipums.org/ ...
#3 Updated by Bruce Wilson over 8 years ago
I communicated the formatId issue to Fabio and Wend y today (2014-08-28 2:00 PM EDT).
I think that there's a fourth issue to address:
3b) Identifiers appear to be using mixed case and underscores, but the MPC identifiers are all lower case and dots. For example, in https://dataone-test.pop.umn.edu/mn/v1/object/ipumsi_6-3_cr_1984.rm.xml one of the items in the aggregation is @@. But the object that this likely refers to has an identifier ipumsi_6-3_cr_1984.dc.xml
#5 Updated by Chris Jones about 8 years ago
I've attached two example resource maps to help clarify the content of the MPC resource maps.
The first describes an aggregation that has no science data in it, but rather three metadata files (Dublin Core file, DDIC XML file, and DDIC HTML file). Because only the Dublin Core metadata file is formatType METADATA in our object format list, it's fields will get parsed into the search index. The DDIC XML file, for now, can stay with a formatId of application/octet-stream, and it's fields won't be parsed. The DDIC HTML transform should have a formatId of text/html, and it too won't have it's fields parsed. However, all three of these files will be available for download by scientists since they are part of the aggregation (Data Package).
The second example describes an aggregation that contains one science metadata file (Dublin Core), and two science data files (CSVs). This resource map shows how the one science metadata file 'cito:documents' the two science data files, and all three are members of the aggregation (Data Package).