CN Index: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2016-02-27T04:57:41ZDataONE Tasks
Redmine Task #7667 (Closed): Indexing of URL-like identifiers failshttps://redmine.dataone.org/issues/76672016-02-27T04:57:41ZChris Jonescjones@nceas.ucsb.edu
<p>Bryce reported that indexing of pids with a mix of underscores and colons cause the RdfXmlSubprocessor to fail in a Metacat MN instance using the d1_cn_index_processor component:</p>
<p>metacat 20160226-16:23:14: [DEBUG]: create() complete for object: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.dataone.D1NodeService]<br>
java.net.URISyntaxException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786<br>
at java.net.URI$Parser.fail(URI.java:2829)<br>
at java.net.URI$Parser.checkChars(URI.java:3002)<br>
at java.net.URI$Parser.parse(URI.java:3029)<br>
at java.net.URI.(URI.java:595)<br>
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.process(RdfXmlSubprocessor.java:172)<br>
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.processDocument(RdfXmlSubprocessor.java:116)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:235)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)<br>
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)<br>
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)<br>
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)<br>
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)<br>
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)<br>
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)<br>
at java.lang.Thread.run(Thread.java:745)<br>
metacat-index 20160226-16:23:14: [ERROR]: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.index.SolrIndex]<br>
java.net.URISyntaxException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786<br>
at java.net.URI$Parser.fail(URI.java:2829)<br>
at java.net.URI$Parser.checkChars(URI.java:3002)<br>
at java.net.URI$Parser.parse(URI.java:3029)<br>
at java.net.URI.(URI.java:595)<br>
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.process(RdfXmlSubprocessor.java:172)<br>
at org.dataone.cn.indexer.annotation.RdfXmlSubprocessor.processDocument(RdfXmlSubprocessor.java:116)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:235)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)<br>
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)<br>
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)<br>
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)<br>
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)<br>
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)<br>
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)<br>
at java.lang.Thread.run(Thread.java:745)<br>
metacat-index 20160226-16:23:14: [ERROR]: SolrIndex.update - could not update the solr index since Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786 [edu.ucsb.nceas.metacat.index.SolrIndex]<br>
org.apache.solr.client.solrj.SolrServerException: Illegal character in scheme name at index 11: resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.process(SolrIndex.java:240)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.insert(SolrIndex.java:384)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:590)<br>
at edu.ucsb.nceas.metacat.index.SolrIndex.update(SolrIndex.java:545)<br>
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryUpdated(SystemMetadataEventListener.java:146)<br>
at edu.ucsb.nceas.metacat.index.SystemMetadataEventListener.entryAdded(SystemMetadataEventListener.java:119)<br>
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:148)<br>
at com.hazelcast.client.impl.EntryListenerManager.notifyListeners(EntryListenerManager.java:130)<br>
at com.hazelcast.client.impl.ListenerManager.customRun(ListenerManager.java:88)<br>
at com.hazelcast.client.ClientRunnable.run(ClientRunnable.java:30)<br>
at java.lang.Thread.run(Thread.java:745)</p>
<p>This example of an offending identifier (resourceMap_urn:uuid:62fa3c9d-6a22-4bf7-aafe-1cfc98fa4786) has a colon delimiter which, when attempting to construct a URI object for loading into the Jena model, fails because the previous underscore is not a legal character for a URI scheme component.</p>
<p>Fix the RdfXmlSubprocessor to catch exceptions like these and treat these identifiers as non-URIs (i.e. convert them to http URIs).</p>