Project

General

Profile

Task #3854

MNDeployment #3118: Dryad Member Node

Troubleshoot Dryad science metadata creation on CNs

Added by Chris Jones over 9 years ago. Updated over 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
2013-06-28
Due date:
% Done:

100%

Story Points:
Sprint:

Description

After registering the dev.datadryad.org Member Node into the CN Development environment, the synchronization process began and thousands of documents synchronized. However, all science metadata documents have failed to sync, while resource maps and data files sync fine. The call to CN.create() is failing with:

knb 20130628-15:14:19: [FATAL]: DBSaxHandler.fatalError - White spaces are required between publicId and systemId. [edu.ucsb.nceas.metacat.DBSAXHandler]
knb 20130628-15:14:19: [ERROR]: DocumentImpl.write - Problem with parsing: Fatal processing error. [edu.ucsb.nceas.metacat.DocumentImpl]

This indicates that Metacat is unable to validate the science metadata documents against the Dryad Schema. Troubleshoot why this is the case.


Related issues

Related to Member Nodes - Task #3774: return NotFound exceptions for /meta and /object calls using "near miss" identifiers New 2013-05-22

History

#1 Updated by Chris Jones over 9 years ago

I've downloaded a single example Dryad science metadata document, and have tried to load it into a Metacat-based MN, and get the same result. This confirms that it's not a misconfiguration of the CNs. I've then attempted to validate the document against the stated schema using xmlstarlet:

$ xml val --net -e -s dryad.xsd dryad-scimeta.xml
failed to load external entity "dcterms.xsd"
dryad.xsd:6.0: Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'dcterms.xsd'. Skipping the import.
failed to load external entity "bibo.xsd"
dryad.xsd:9.0: Element '{http://www.w3.org/2001/XMLSchema}import': Failed to locate a schema at location 'bibo.xsd'. Skipping the import.
dryad.xsd:29.0: Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://purl.org/ontology/bibo/}pmid' does not resolve to a(n) element declaration.
dryad.xsd:30.0: Element '{http://www.w3.org/2001/XMLSchema}element', attribute 'ref': The QName value '{http://purl.org/ontology/bibo/}Journal' does not resolve to a(n) element declaration.

The system identifiers for these two external schemas point to local files, and are not found. xmlstarlet also could not locate the schema files using a network connection at the standard schemaLocation URL. My thought is that Metacat tries the same and fails. As a new science metadata format being supported by DataONE, we need to ensure that the schema files for all imports are resolvable.

To complicate this, the bibo namespace resolves to an OWL Ontology expressed in RDF, as opposed to an XML Schema document. This is also problematic - elements in the bibo: namespace will not resolve to a known element declaration.

#2 Updated by Ryan Scherle over 9 years ago

I corrected some issues with the schema and installed apache redirects for the bibo and dcterms schemas. The documents now validate in oXygen.

#3 Updated by Bruce Wilson over 9 years ago

  • Target version changed from Deploy by end of Y4Q4 to Deploy by end of Y5Q2

#4 Updated by Skye Roseboom over 9 years ago

  • Assignee changed from Chris Jones to Ryan Scherle

Discovered this issue on 8-7-13 while testing indexing of ORE from dryad MN.

The pids listed on the member nodes object list page (https://dev.datadryad.org/mn/object/?count=10) display pids in the format:

http://dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400

However asking for the meta data record for the pid: https://dev.datadryad.org/mn/meta/http://dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400

displays a system metadata record with an identifier with format:

http:/dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400

The object list version shows http:// (2 slashes) while system metadata shows identifiers with one slash - http:/

It appears the same metadata record is retrieved with one or two slashed in the /meta REST call above. This also appears to be the case with the /object REST endpoint:

https://dev.datadryad.org/mn/object/http://dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400
https://dev.datadryad.org/mn/object/http:/dx.doi.org/10.5061/dryad.12?ver=2011-08-02T16:00:05.530-0400

appear to return the same object bytes.

Dryad's ORE documents refer to resources with identifiers in the form on http://. However the CN recorded the pids by identifier provided by the /meta endpoint - http:/

So the ORE documents do not appear to refer to any document on the CN - because of the mix of single and double slashes.

#5 Updated by Ryan Scherle over 9 years ago

  • Assignee changed from Ryan Scherle to Skye Roseboom

All /meta records have been updated to include double slashes.

#6 Updated by Skye Roseboom over 9 years ago

  • Assignee changed from Skye Roseboom to Chris Jones

Chris - I scrubbed and re-synchronized the cn-dev environment but still seeing creation errors during sync:

[ERROR] 2013-08-09 23:10:20,922 (TransferObjectTask:write:457) Task-urn:node:mnTestDRYAD-http://dx.doi.org/10.5061/dryad.rb7g719t?ver=2012-03-02T12:06:00.626-0500
<?xml version="1.0" encoding="UTF-8"?>

Error inserting or updating document: <?xml version="1.0"?><error>cvc-complex-type.2.4.b: The content of element 'DryadDataPackage' is not complete. One of '{"http://purl.org/dc/terms/":relation, "http://purl.org/dc/terms/":references}' is expected.</error>

#7 Updated by Chris Jones over 9 years ago

  • translation missing: en.field_remaining_hours set to 0.0
  • Status changed from In Progress to Closed

I'm closing this issue, since I think we have the schema issues resolved, and we have sync'd the large majority of science metadata documents. Other parsing errors on the CN have been related to #3855, which are largely related to instance documents that don't conform to the schema.

#8 Updated by Laura Moyers almost 9 years ago

  • Target version changed from Deploy by end of Y5Q2 to Deploy by end of Y5Q3

#9 Updated by Laura Moyers over 8 years ago

  • Target version changed from Deploy by end of Y5Q3 to Operational

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)