Project

General

Profile

Bug #4724

EML pubDate allows large xs:gYear values

Added by Ben Leinfelder about 8 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Ben Leinfelder
Category:
d1_indexer
Target version:
Start date:
2014-10-01
Due date:
2014-10-01
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

Lauren noticed many LTER files use a pubDate format of CCYYMMDD when it should contain dashes (CCYY-MM-DD) to conform with the xs:date format rules. Unfortunately, this EML field is defined as the union of xs:gYear and xs:date meaning that '20120304' is schema-valid but is really a year far far far in the future.

Example pubDates:
https://cn.dataone.org/cn/v1/query/solr/?fl=pubDate,id&q=pubDate:20121*&rows=3000

We can't go back and correct these dates, but we could parse them with their intention in mind rather than assume that people want gYears far in the future. I wrote a draft of how we could handle this in SolrDateConverter.convert(), but you must specify in your bean definition (assumeDate=true) for it to be used.

History

#1 Updated by Ben Leinfelder about 8 years ago

  • Assignee changed from Dave Vieglais to Ben Leinfelder

#2 Updated by Matthew Jones about 8 years ago

We may want to consider a new release of EML that would add an additional (year <= 9999) constraint to the definition of the year field for pubDate. That would catch these formats and only allow reasonable dates. That would mean the spec would be ineffective in 6986 years. Probably safe enough. And it would mean some documents would need to be converted because they would be invalid under the new release.

#3 Updated by Skye Roseboom over 7 years ago

  • Target version set to Release Backlog
  • Start date set to 2014-10-01
  • Due date set to 2014-10-01

#4 Updated by Ben Leinfelder over 6 years ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Closed

Work-around solution for this exists for indexing. Changing the EML spec is another thing altogether.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)