Task #3854
MNDeployment #3118: Dryad Member Node
Troubleshoot Dryad science metadata creation on CNs
100%
Description
After registering the dev.datadryad.org Member Node into the CN Development environment, the synchronization process began and thousands of documents synchronized. However, all science metadata documents have failed to sync, while resource maps and data files sync fine. The call to CN.create() is failing with:
knb 20130628-15:14:19: [FATAL]: DBSaxHandler.fatalError - White spaces are required between publicId and systemId. [edu.ucsb.nceas.metacat.DBSAXHandler]
knb 20130628-15:14:19: [ERROR]: DocumentImpl.write - Problem with parsing: Fatal processing error. [edu.ucsb.nceas.metacat.DocumentImpl]
This indicates that Metacat is unable to validate the science metadata documents against the Dryad Schema. Troubleshoot why this is the case.
Related issues
History
#1 Updated by Chris Jones over 11 years ago
I've downloaded a single example Dryad science metadata document, and have tried to load it into a Metacat-based MN, and get the same result. This confirms that it's not a misconfiguration of the CNs. I've then attempted to validate the document against the stated schema using xmlstarlet:
$ xml val --net -e -s dryad.xsd dryad-scimeta.xml
failed to load external entity "dcterms.xsd"
dryad.xsd:6.0: Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'dcterms.xsd'. Skipping the import.
failed to load external entity "bibo.xsd"
dryad.xsd:9.0: Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'bibo.xsd'. Skipping the import.
dryad.xsd:29.0: Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://purl.org/ontology/bibo/}pmid' does not resolve to a(n) element declaration.
dryad.xsd:30.0: Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://purl.org/ontology/bibo/}Journal' does not resolve to a(n) element declaration.
The system identifiers for these two external schemas point to local files, and are not found. xmlstarlet also could not locate the schema files using a network connection at the standard schemaLocation URL. My thought is that Metacat tries the same and fails. As a new science metadata format being supported by DataONE, we need to ensure that the schema files for all imports are resolvable.
To complicate this, the bibo namespace resolves to an OWL Ontology expressed in RDF, as opposed to an XML Schema document. This is also problematic - elements in the bibo: namespace will not resolve to a known element declaration.
#2 Updated by Ryan Scherle over 11 years ago
I corrected some issues with the schema and installed apache redirects for the bibo and dcterms schemas. The documents now validate in oXygen.
#3 Updated by Bruce Wilson over 11 years ago
- Target version changed from Deploy by end of Y4Q4 to Deploy by end of Y5Q2
#4 Updated by Skye Roseboom over 11 years ago
- Assignee changed from Chris Jones to Ryan Scherle
Discovered this issue on 8-7-13 while testing indexing of ORE from dryad MN.
The pids listed on the member nodes object list page (https://dev.datadryad.org/mn/object/?count=10) display pids in the format:
http://dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400
However asking for the meta data record for the pid: https://dev.datadryad.org/mn/meta/http://dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400
displays a system metadata record with an identifier with format:
http:/dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400
The object list version shows http:// (2 slashes) while system metadata shows identifiers with one slash - http:/
It appears the same metadata record is retrieved with one or two slashed in the /meta REST call above. This also appears to be the case with the /object REST endpoint:
https://dev.datadryad.org/mn/object/http://dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400
https://dev.datadryad.org/mn/object/http:/dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400
appear to return the same object bytes.
Dryad's ORE documents refer to resources with identifiers in the form on http://. However the CN recorded the pids by identifier provided by the /meta endpoint - http:/
So the ORE documents do not appear to refer to any document on the CN - because of the mix of single and double slashes.
#5 Updated by Ryan Scherle over 11 years ago
- Assignee changed from Ryan Scherle to Skye Roseboom
All /meta records have been updated to include double slashes.
#6 Updated by Skye Roseboom over 11 years ago
- Assignee changed from Skye Roseboom to Chris Jones
Chris - I scrubbed and re-synchronized the cn-dev environment but still seeing creation errors during sync:
[ERROR] 2013-08-09 23:10:20,922 (TransferObjectTask:write:457) Task-urn:node:mnTestDRYAD-http://dx.doi.org/10.5061/dryad.rb7g719t?ver=2012-03-02T12:06:00.626-0500
<?xml version="1.0" encoding="UTF-8"?>
Error inserting or updating document: <?xml version="1.0"?><error>cvc-complex-type.2.4.b: The content of element 'DryadDataPackage' is not complete. One of '{"http://purl.org/dc/terms/":relation, "http://purl.org/dc/terms/":references}' is expected.</error>
#7 Updated by Chris Jones over 11 years ago
- translation missing: en.field_remaining_hours set to 0.0
- Status changed from In Progress to Closed
I'm closing this issue, since I think we have the schema issues resolved, and we have sync'd the large majority of science metadata documents. Other parsing errors on the CN have been related to #3855, which are largely related to instance documents that don't conform to the schema.
#8 Updated by Laura Moyers almost 11 years ago
- Target version changed from Deploy by end of Y5Q2 to Deploy by end of Y5Q3
#9 Updated by Laura Moyers over 10 years ago
- Target version changed from Deploy by end of Y5Q3 to Operational