Task #7667
Indexing of URL-like identifiers fails
100%
Description
Bryce reported that indexing of pids with a mix of underscores and colons cause the RdfXmlSubprocessor to fail in a Metacat MN instance using the d1_cn_index_processor component:
metacat 20160226-16:23:14: [DEBUG]: create() complete for object: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.dataone.D1NodeService]
java.net.URISyntaxException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.checkChars(URI.java:3002)
at java.net.URI$Parser.parse(URI.java:3029)
at java.net.URI.(URI.java:595)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.process(RdfXmlSubprocessor.java:172)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.processDocument(RdfXmlSubprocessor.java:116)
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:235)
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
metacat-index 20160226-16:23:14: [ERROR]: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.index.SolrIndex]
java.net.URISyntaxException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.checkChars(URI.java:3002)
at java.net.URI$Parser.parse(URI.java:3029)
at java.net.URI.(URI.java:595)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.process(RdfXmlSubprocessor.java:172)
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.processDocument(RdfXmlSubprocessor.java:116)
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:235)
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
metacat-index 20160226-16:23:14: [ERROR]: SolrIndex.update - could not update the solr index since Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.index.SolrIndex]
org.apache.solr.client.solrj.SolrServerException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:240)
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)
at java.lang.Thread.run(Thread.java:745)
This example of an offending identifier (resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786) has a colon delimiter which, when attempting to construct a URI object for loading into the Jena model, fails because the previous underscore is not a legal character for a URI scheme component.
Fix the RdfXmlSubprocessor to catch exceptions like these and treat these identifiers as non-URIs (i.e. convert them to http URIs).
History
#1 Updated by Chris Jones almost 9 years ago
- Status changed from In Progress to Closed
- translation missing: en.field_remaining_hours set to 0.0
- % Done changed from 30 to 100
#2 Updated by Dave Vieglais almost 9 years ago
- Estimated time set to 0.00
- Target version set to CCI-2.1.1
#3 Updated by Jing Tao almost 9 years ago
- Status changed from Closed to In Progress
- % Done changed from 100 to 30
Adding the "http://" does't work. Ben and I decided to use this uri: https://cn.dataone.org/cn/v1/resolve/.
#4 Updated by Jing Tao almost 9 years ago
- % Done changed from 30 to 100
- Status changed from In Progress to Closed
Bryce and I tested it. It works for us.
#5 Updated by Dave Vieglais almost 9 years ago
- Target version changed from CCI-2.1.1 to CCI-2.1.2