Project

General

Profile

Bug #6870

Fix handling of identifiers with url-escaped characters

Added by Chris Jones about 9 years ago. Updated about 9 years ago.

Status:
Rejected
Priority:
High
Assignee:
Category:
d1_mercury_ui
Start date:
2015-02-28
Due date:
% Done:

0%

Story Points:
Sprint:

Description

We've been getting new content from Dryad, and the ONEMercury handling of the identifiers looks to be broken. For example, doing a search for * and Dryad as the Member Node:

https://cn.dataone.org/onemercury/send/query?term1=*&term1attribute=fullText&term2.1=&term2.1attribute=fullText&op2.1=AND&term2.2attribute=fullText&term2.2=&op2.2=AND&term2.3attribute=fullText&term2.3=&op2.3=and&term3=%2C%2C%2C&term3attribute=overlaps&op4=during&term4=&term4attribute=beginDate&term5=&term5attribute=endDate&term6attribute=datasource&term8=either&pageSize=10&queryString=+Entire+Document+%3A+*++and+true+coordinates+%28N%2CW%2CS%2CE%29+%3D+%28%2C%2C%2C%29+and+++and++from+sources%3A+urn%3Anode%3ADRYAD&instance=pilotcatalog&filterForDataHidden=&term6=urn%3Anode%3ADRYAD

Most all of the identifiers are rendered incorrectly. Some look truncated, others contain XML markup, etc:

ttp://dx.doi.org/10.5061/dryad.6gr7t/2?ver=2014-02-21T12:54:19.782-05:00
x.doi.org/10.5061/dryad.121d03jc/11?ver=2012-08-16T10:38:20.266-04:00
00oi.org/10.5061/dryad.121d03jc/8?ver=2012-08-16T10:36:22.333-04:00
080/9?ver=2013-01/10.5061/dryad.sd080/9?ver=2013-01-30T12:24:48.723-05:00

This results in broken links to metadata content, such as:

https://cn.dataone.org/onemercury/send/xsltText2?pid=%3Ettp://dx.doi.org/10.5061/dryad.6gr7t/2?ver=2014-02-21T12:54:19.782-05:00&fileURL=https://cn.dataone.org/cn/v1/resolve/http%3A%2F%2Fdx.doi.org%2F10.5061%2Fdryad.6gr7t%2F2%3Fver%3D2014-02-21T12%3A54%3A19.782-05%3A00&full_datasource=Dryad%20Digital%20Repository&full_queryString=%20*%20AND%20has%20direct%20data%20AND%20%28%20datasource%20:%28%20urn:node:DRYAD%20%20%29%20%29%20&ds_id=

When using the the actual pid, the content is present:

https://cn.dataone.org/onemercury/send/xsltText2?pid=http://dx.doi.org/10.5061/dryad.6gr7t/2?ver=2014-02-21T12:54:19.782-05:00&fileURL=https://cn.dataone.org/cn/v1/resolve/http%3A%2F%2Fdx.doi.org%2F10.5061%2Fdryad.6gr7t%2F2%3Fver%3D2014-02-21T12%3A54%3A19.782-05%3A00&full_datasource=Dryad%20Digital%20Repository&full_queryString=%20*%20AND%20has%20direct%20data%20AND%20%28%20datasource%20:%28%20urn:node:DRYAD%20%20%29%20%29%20&ds_id=#top

We need to track down where the identifiers are getting mangled.


Related issues

Duplicates CN Index - Bug #6800: SOLR indexes malformed strings - identifier, id Closed 2015-02-03

History

#1 Updated by Robert Waltz about 9 years ago

I performed the first search identified above and found this result:

Santos, Scott R.. 01/14/2014. Car_ign_only_1200bp+_nucleotide_contigs_from_Ray_assembly.
Identifier: >ttp://dx.doi.org/10.5061/dryad.6gr7t/2?ver=2014-02-21T12:54:19.782-05:00 Datasource: Dryad Digital Repository
FASTA file of only nucleotide contigs >=1,200 bp assembled from 139,329,276 100 bp pair end (PE) reads from an Illumina HiSeq 2000 using Ray v2.0.0 for Caranx ignobilis...

Above is the first entry returned from the search
The raw html shows us this text:

// Register button with downloadPanel component

M3.downloadPanel.registerButton('d1-download-panel-button-1','http://dx.doi.org/10.5061/dryad.6gr7t?format=d1rem&ver=2014-02-21T13:17:41.985-05:00', '>ttp://dx.doi.org/10.5061/dryad.6gr7t/2?ver=2014-02-21T12:54:19.782-05:00');

Download
ttp://dx.doi.org/10.5061/dryad.6gr7t/2?ver=2014-02-21T12:54:19.782-05:00&fileURL=https://cn.dataone.org/cn/v1/resolve/http%3A%2F%2Fdx.doi.org%2F10.5061%2Fdryad.6gr7t%2F2%3Fver%3D2014-02-21T12%3A54%3A19.782-05%3A00&full_datasource=Dryad Digital Repository&full_queryString= * AND has direct data AND ( datasource :( urn:node:DRYAD ) ) &ds_id='">View full metadata

Identifier:

</field>x.doi.org/10.5061/dryad.121d03jc/11?ver=2012-08-16T10:38:20.266-04:00

I believe http://dx.doi.org/10.5061/dryad.6gr7t?format=d1rem&ver=2014-02-21T13:17:41.985-05:00 to be the valid identifier, but it is mangled in a variety of ways in the html produced above. It appears that when the identifier is combined to create URLs then parts of the resulting url is truncated.

#2 Updated by Dave Vieglais about 9 years ago

  • Tracker changed from Task to Bug

#3 Updated by Rob Nahf about 9 years ago

  • Related to Bug #6800: SOLR indexes malformed strings - identifier, id added

#4 Updated by Dave Vieglais about 9 years ago

  • Assignee changed from Mark Servilla to Dave Vieglais

#5 Updated by Skye Roseboom about 9 years ago

  • Status changed from New to Rejected

Identifiers are being mangled in the indexing process. Not a bug in one-mercury. Duplicate of #6800.

#6 Updated by Skye Roseboom about 9 years ago

  • Related to deleted (Bug #6800: SOLR indexes malformed strings - identifier, id)

#7 Updated by Skye Roseboom about 9 years ago

  • Duplicates Bug #6800: SOLR indexes malformed strings - identifier, id added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)