Bug #4724
EML pubDate allows large xs:gYear values
100%
Description
Lauren noticed many LTER files use a pubDate format of CCYYMMDD when it should contain dashes (CCYY-MM-DD) to conform with the xs:date format rules. Unfortunately, this EML field is defined as the union of xs:gYear and xs:date meaning that '20120304' is schema-valid but is really a year far far far in the future.
Example pubDates:
https://cn.dataone.org/cn/v1/query/solr/?fl=pubDate,id&q=pubDate:20121*&rows=3000
We can't go back and correct these dates, but we could parse them with their intention in mind rather than assume that people want gYears far in the future. I wrote a draft of how we could handle this in SolrDateConverter.convert(), but you must specify in your bean definition (assumeDate=true) for it to be used.
History
#1 Updated by Ben Leinfelder over 10 years ago
- Assignee changed from Dave Vieglais to Ben Leinfelder
#2 Updated by Matthew Jones over 10 years ago
We may want to consider a new release of EML that would add an additional (year <= 9999) constraint to the definition of the year field for pubDate. That would catch these formats and only allow reasonable dates. That would mean the spec would be ineffective in 6986 years. Probably safe enough. And it would mean some documents would need to be converted because they would be invalid under the new release.
#3 Updated by Skye Roseboom about 10 years ago
- Target version set to Release Backlog
- Start date set to 2014-10-01
- Due date set to 2014-10-01
#4 Updated by Ben Leinfelder almost 9 years ago
- % Done changed from 0 to 100
- Status changed from In Progress to Closed
Work-around solution for this exists for indexing. Changing the EML spec is another thing altogether.