Project

General

Profile

Task #5940

MNDeployment #3708: Minnesota Population Center

Task #5921: MPC: Testing

Task #5922: MPC: Registration in environment

Task #5933: MPC: Content Review

Task #5937: MPC: Verify Science Metadata

MPC: Verify that the Science Metadata is correctly processed by CNs

Added by Laura Moyers almost 10 years ago. Updated about 9 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Target version:
Start date:
2014-07-18
Due date:
% Done:

100%

Story Points:
Sprint:

Description

Verify that CNs are able to index any supported Science Metadata objects.
Successfully indexed objects appear in the Solr index. Not all Science Metadata formats are supported by DataONE. If the MN does not expose any supported objects, this ticket can be closed directly.


Related issues

Related to Infrastructure - Bug #6477: Verify MPC QDC Science Metadata files synchronize to CN Rejected

History

#1 Updated by Laura Moyers over 9 years ago

  • Target version changed from Deploy by end of Y5Q4 to Deploy by end of Y1Q1

#2 Updated by Chris Jones over 9 years ago

  • Status changed from New to In Progress
  • Assignee set to Chris Jones

I'll be turning on synchronization in the stage environment to verify content processing. Stand by.

#3 Updated by Chris Jones over 9 years ago

In harvesting science metadata content to the stage Coordinating Node, we've run into parsing errors for the Dublin Core metadata files:

<?xml version="1.0"?>

cvc-elt.1: Cannot find the declaration of element 'qualifieddc'.

In looking at an example XML file like https://dataone-test.pop.umn.edu/mn/v1/object/ipumsi_6.3_ke_1999_DC.xml, I see two possible issues causing the parsing problem:

1) The element insn't namespaced. While the document's schema location is provided, there's no xmlns declaration for the qualifieddc schema, and I'm thinking the parser doesn't know that is an element defined in qualifieddc.xsd. I think all of the documents need to be updated to declare the namespace of the root element, like


...
/qdc:qualifieddc

In this way, the root element should be parsed correctly.

2) The xsi:schemaLocation attribute has:

xsi:schemaLocation="qualifieddc.xsd http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd"

I think the values in this field are reversed. It should be a pair of {NAMESPACE, LOCATION} strings, with the namespace first, like:

xsi:schemaLocation="http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd qualifieddc.xsd"

So, the parser should know that this document is adhering to the 'http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd' namespace, and it will find the physical schema at the location given, or in it's catalog. For the CNs, it will find it in the schema catalog.

So, I think that addressing these two issues should solve the parsing issue found above. I'll send this to Wendy and Fabio to have a look.

#4 Updated by Chris Jones over 9 years ago

Wendy and Fabio updated their science metadata documents with the above changes. We unfortunately now have a new error:


Error inserting or updating document: <?xml version="1.0"?><error>TargetNamespace.1:
Expecting namespace 'http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd',
but the target namespace of the schema document is 'null'.</error>

My thought here is that the schema validators on my workstation (xmllint, xmlstarlet, both using libxml) are configured differently than the Xerces SAX parser on the CNs. Documents that had un-namespaced elements validated fine on my machine, whereas the Xerces SAX parser throws the error above. When I change the document locally to use qdc:qualifieddc, xmllint throws an error, saying:

ipumsi_6.3_am_2001_DC.xml:2: element qualifieddc:
Schemas validity error : Element '{http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd}qualifieddc':
No matching global declaration available for the validation root.
ipumsi_6.3_am_2001_DC.xml fails to validate

This points to the fact that the qualifieddc schema downloaded from http://dublincore.org/schemas/xmls/qdc/2008/02/11/qualifieddc.xsd doesn't have a targetNamespace attribute, since, according to DCMI, this is an un-namespaced 'container' schema. This behavior matches what we see from Xerces, which says that the target namespace for this schema is 'null'.

So, we're stuck between a rock and a hard place. Xerces doesn't like un-namespaced elements like , but when you namespace it with qdc:qualifieddc, it somewhat flippantly tells us that the schema used to validate it isn't typed with that namespace. Ugh, software!

Perhaps there is a property setting on the Xerces parser that allows untyped root elements, but typed child elements? Am I missing something else? I'll ping others (Ben, Matt, Dave especially) to get more eyes on this.

#5 Updated by Laura Moyers over 9 years ago

  • Target version changed from Deploy by end of Y1Q1 to Deploy by end of NCTE

#6 Updated by Laura Moyers over 9 years ago

  • Target version changed from Deploy by end of NCTE to Operational

#7 Updated by Laura Moyers about 9 years ago

  • % Done changed from 0 to 100
  • translation missing: en.field_remaining_hours set to 0
  • Status changed from In Progress to Closed

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)