Project

General

Profile

Bug #8043

The origin field for EML documents isn't properly extracted when references are used

Added by Bryce Mecum over 4 years ago. Updated almost 3 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
d1_indexer
Target version:
Start date:
2017-03-10
Due date:
% Done:

0%

Milestone:
None
Product Version:
Story Points:
Sprint:

Description

We just ran into this with the following EML record: https://knb.ecoinformatics.org/#view/doi:10.5063/F15B00CC

The EML has six creators (Kiesecker, Fargione, Baruch-Mordo, Trainor, Ryan, Patterson) but the origin field in the Solr index has two (Ryan, Patterson). After some digging, we realized this was likely because the indexing component responsible for EML doesn't respect EML references. The XML for the relevant section is:


<creator scope="document">
<references>1484778487589</references>
</creator>
<creator scope="document">
<references>1484778426939</references>
</creator>
<creator scope="document">
<references>1484778028081</references>
</creator>
<creator scope="document">
<references>1484778171131</references>
</creator>
<creator id="1485385283277" scope="document">
<individualName>
<salutation>Dr.</salutation>
<givenName>Joe</givenName>
<surName>Ryan</surName>
</individualName>
<organizationName>University of Colorado Boulder</organizationName>
<positionName>Professor</positionName>
<electronicMailAddress>joseph.ryan@colorado.edu</electronicMailAddress>
</creator>
<creator id="1484777776976" scope="document">
<individualName>
<salutation>Dr.</salutation>
<givenName>Lauren</givenName>
<surName>Patterson</surName>
</individualName>
<organizationName>Duke University</organizationName>
<positionName>Water Policy Associate</positionName>
<address scope="document">
<deliveryPoint>Nicholas Institute for Environmental Policy Solutions, Duke University</deliveryPoint>
<city>Durham</city>
<administrativeArea>NC</administrativeArea>
<postalCode>27708</postalCode>
<country>USA</country>
</address>
<electronicMailAddress>lauren.patterson@duke.edu</electronicMailAddress>
</creator>

It would be really nice if the origin field got populated with all those referenced creators.

History

#1 Updated by Bryce Mecum over 4 years ago

Also, and very importantly, chris guesses that the indexing component isn't doing anything at all with references so other fields should be affected here too.

#2 Updated by Jing Tao over 4 years ago

  • Assignee set to Jing Tao
  • Target version set to CCI-2.4.0

#3 Updated by Dave Vieglais over 4 years ago

  • Project changed from CN Index to Infrastructure
  • Category set to d1_indexer
  • Milestone set to None

#4 Updated by Matthew Jones almost 3 years ago

  • Description updated (diff)

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)