EML pubDate allows large xs:gYear values
Lauren noticed many LTER files use a pubDate format of CCYYMMDD when it should contain dashes (CCYY-MM-DD) to conform with the xs:date format rules. Unfortunately, this EML field is defined as the union of xs:gYear and xs:date meaning that '20120304' is schema-valid but is really a year far far far in the future.
We can't go back and correct these dates, but we could parse them with their intention in mind rather than assume that people want gYears far in the future. I wrote a draft of how we could handle this in SolrDateConverter.convert(), but you must specify in your bean definition (assumeDate=true) for it to be used.
#2 Updated by Matthew Jones about 8 years ago
We may want to consider a new release of EML that would add an additional (year <= 9999) constraint to the definition of the year field for pubDate. That would catch these formats and only allow reasonable dates. That would mean the spec would be ineffective in 6986 years. Probably safe enough. And it would mean some documents would need to be converted because they would be invalid under the new release.