Project

General

Profile

Task #7667

Indexing of URL-like identifiers fails

Added by Chris Jones about 8 years ago. Updated about 8 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
d1_cn_index_processor
Start date:
2016-02-27
Due date:
% Done:

100%

Estimated time:
0.00 h
Story Points:
Sprint:

Description

Bryce reported that indexing of pids with a mix of underscores and colons cause the RdfXmlSubprocessor to fail in a Metacat MN instance using the d1_cn_index_processor component:

metacat 20160226-16:23:14: [DEBUG]: create() complete for object: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.dataone.D1NodeService]
java.net.URISyntaxException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.checkChars(URI.java:3002)
at java.net.URI$Parser.parse(URI.java:3029)
at java.net.URI.(URI.java:595)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.process(RdfXmlSubprocessor.java:172)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.processDocument(RdfXmlSubprocessor.java:116)
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:235)
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
metacat-index 20160226-16:23:14: [ERROR]: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.index.SolrIndex]
java.net.URISyntaxException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.checkChars(URI.java:3002)
at java.net.URI$Parser.parse(URI.java:3029)
at java.net.URI.(URI.java:595)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.process(RdfXmlSubprocessor.java:172)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.processDocument(RdfXmlSubprocessor.java:116)
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:235)
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
metacat-index 20160226-16:23:14: [ERROR]: SolrIndex.update - could not update the solr index since Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.index.SolrIndex]
org.apache.solr.client.solrj.SolrServerException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:240)
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)

This example of an offending identifier (resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786) has a colon delimiter which, when attempting to construct a URI object for loading into the Jena model, fails because the previous underscore is not a legal character for a URI scheme component.

Fix the RdfXmlSubprocessor to catch exceptions like these and treat these identifiers as non-URIs (i.e. convert them to http URIs).

History

#1 Updated by Chris Jones about 8 years ago

  • Status changed from In Progress to Closed
  • translation missing: en.field_remaining_hours set to 0.0
  • % Done changed from 30 to 100

#2 Updated by Dave Vieglais about 8 years ago

  • Estimated time set to 0.00
  • Target version set to CCI-2.1.1

#3 Updated by Jing Tao about 8 years ago

  • Status changed from Closed to In Progress
  • % Done changed from 100 to 30

Adding the "http://" does't work. Ben and I decided to use this uri: https://cn.dataone.org/cn/v1/resolve/.

#4 Updated by Jing Tao about 8 years ago

  • % Done changed from 30 to 100
  • Status changed from In Progress to Closed

Bryce and I tested it. It works for us.

#5 Updated by Dave Vieglais about 8 years ago

  • Target version changed from CCI-2.1.1 to CCI-2.1.2

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)