MNDeployment #7082: USGS Science Data Catalog (SDC)
Determine how to handle metadata which is well-formed but does not validate
SDC is an example of cases where there are metadata records that are declared as FGDC, but contain extensions (such as the ESRI extensions) so that the file does not actually validate according to the declared schema. There objective of this task is to determine which approach to take. One extreme is the current approach, where only metadata files that pass validation are accepted. Another is to allow any well-formed XML (and potentially set the sysmeta to declare the metadata type as text/xml). From Bruce's perspective, if we accept metadata that's not schema valid, it should be straightforward to determine from sysmeta (or something similar) which metadata is schema valid and which is not. There are intermediate proposals along the idea of looking to see if schema invalid metadata matches one of a list of known alternatives and only accepting the metadata where it matches something from a known list of variants to the declared schema.
The output from this task is a plan with tasks that will at least allow the ESRI-extended FGDC metadata from SDC to be accepted into DataONE.
* The metadata must at least be well-formed XML
* The metadata mush be at least interpretable for discovery.
* Tools need to be able to determine if the metadata is valid for those tools that want to do deeper interpretation (like the Matlab or R tools)
* It is useful to be able to provide feedback to MNs for the metadata that's interpretable but not schema valid.