Bug #8183
CNDiagnostic.echoIndexedObject() fails on resource maps
100%
Description
Calling @CNDiagnostic.echoIndexedObject()@ with both known-good and known-bad resource maps fails with a @500 ServiceFailure@ exception. We would expect it to return the list of Solr documents for the known-good resource map, and the original exception content for known-bad resource maps.
Here are two examples:
Good (i.e. has been successfully indexed by DataONE):
https://knb.ecoinformatics.org/knb/d1/mn/v2/meta/resourceMap_knb.1389.1
Bad (failed indexing by DataONE):
https://knb.ecoinformatics.org/knb/d1/mn/v2/meta/urn:uuid:c3cb05ae-0fd7-4ebc-960a-4781521d249d
The latter ORE has a syntax error with regard to the @dc:creator@ content, and fails to be parsed by the Jena library with:
[ERROR] 2017-09-12 12:58:51,900 (RDFDefaultErrorHandler:error:40) (line 9 column 61): {E201} rdf:resource not allowed as attribute here.
[ERROR] 2017-09-12 12:58:51,910 (RDFDefaultErrorHandler:error:40) (line 10 column 70): {E201} Multiple children of property element
[ERROR] 2017-09-12 12:58:51,910 (RDFDefaultErrorHandler:error:40) (line 10 column 70): {E201} rdf:datatype not allowed as attribute here.
[ERROR] 2017-09-12 12:58:51,911 (RDFDefaultErrorHandler:error:40) (line 10 column 82): {E202} Expecting XML start or end element(s). String data "Ocean Spaces" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.
So we would expect these errors to trickle up the call stack and be reported to the submitter.
Associated revisions
fixes #8183. (in 2.3 branch) Added NPE check after Hz.getSystemMetadata in clearSidChain. echoIndexObject often doesn't have predeccessor documents that are synchronized, so we cannot rely on the Hz sysmeta map to never return null.
fixes #8183. (in 2.3 branch) Added NPE check after Hz.getSystemMetadata in clearSidChain. echoIndexObject often doesn't have predeccessor documents that are synchronized, so we cannot rely on the Hz sysmeta map to never return null.
History
#1 Updated by Dave Vieglais about 7 years ago
- Assignee set to Jing Tao
- Target version changed from CCI-2.3.5 to CCI-2.3.7
#2 Updated by Dave Vieglais about 7 years ago
- Assignee changed from Jing Tao to Rob Nahf
#3 Updated by Rob Nahf almost 7 years ago
created a new functional integration test specific for resourcemaps (existing test was only scimetadata). ran on cn-sandbox-unm-1 with the cn.war log level bumped to debug. Stacktrace shows an unhandled NPE while trying to clear the SID chain:
[DEBUG] 2017-12-14 21:32:43,036 (CommonsFileUploadSupport:parseFileItems:259) Found multipart file [sysmeta] of size 829 bytes with original filename [mmp.output.2570058838097726727.tmp], stored in memory
[DEBUG] 2017-12-14 21:32:43,037 (CommonsFileUploadSupport:parseFileItems:259) Found multipart file [object] of size 3760 bytes with original filename [mmp.output.1047975705502505412.tmp], stored in memory
[ INFO] 2017-12-14 21:32:43,038 (DiagnosticController:echoIndexedObject:196) Found filepart sysmeta
[ INFO] 2017-12-14 21:32:43,041 (DiagnosticController:echoIndexedObject:196) Found filepart object
[DEBUG] 2017-12-14 21:32:43,070 (ResourceMapFactory:parseResourceMap:458) https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.package.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,072 (ResourceMapFactory:parseResourceMap:490) Agg resource: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.2.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,073 (ResourceMapFactory:parseResourceMap:493) --- documents count: 0
[DEBUG] 2017-12-14 21:32:43,073 (ResourceMapFactory:parseResourceMap:512) +++ isDocBy count: 1
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:516) +++docBySubjectURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.2.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:519) +++metadataURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:522) Creating new HashSet for: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946 : 0
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:490) Agg resource: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.1.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,075 (ResourceMapFactory:parseResourceMap:493) --- documents count: 0
[DEBUG] 2017-12-14 21:32:43,075 (ResourceMapFactory:parseResourceMap:512) +++ isDocBy count: 1
[DEBUG] 2017-12-14 21:32:43,075 (ResourceMapFactory:parseResourceMap:516) +++docBySubjectURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.1.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,076 (ResourceMapFactory:parseResourceMap:519) +++metadataURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,076 (ResourceMapFactory:parseResourceMap:490) Agg resource: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:493) --- documents count: 2
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:498) ---metadataURI : https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:505) ---documentsObject: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.2.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:505) ---documentsObject: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.1.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,078 (ResourceMapFactory:parseResourceMap:512) +++ isDocBy count: 0
[DEBUG] 2017-12-14 21:32:43,078 (ResourceMapFactory:parseResourceMap:532) ~~~~~ data count: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946: 2
[DEBUG] 2017-12-14 21:32:43,175 (HttpConnectionMonitorService:run:103) ...calling closeExpire/IdleConnections...
[DEBUG] 2017-12-14 21:32:43,608 (ResourceMapSubprocessor:clearSidChain:192) Removing pidToProcess===testEchoSystemMetadata_2017348133241216
[DEBUG] 2017-12-14 21:32:43,608 (ResourceMapSubprocessor:clearSidChain:193) Removing objectPath===null
[ WARN] 2017-12-14 21:32:43,609 (SolrIndexService:processObject:241) The subprocessor org.dataone.cn.indexer.parser.ResourceMapSubprocessor can't process the id testEchoSystemMetadata_2017348133241216 since null. However, the index still can be achieved without this part of information provided by the processor.
java.lang.NullPointerException
at org.dataone.cn.index.task.IndexTask.(IndexTask.java:192)
at org.dataone.cn.indexer.parser.ResourceMapSubprocessor.clearSidChain(ResourceMapSubprocessor.java:195)
at org.dataone.cn.indexer.parser.ResourceMapSubprocessor.processResourceMap(ResourceMapSubprocessor.java:152)
at org.dataone.cn.indexer.parser.ResourceMapSubprocessor.processDocument(ResourceMapSubprocessor.java:121)
at org.dataone.cn.indexer.SolrIndexService.processObject(SolrIndexService.java:230)
at org.dataone.cn.rest.v2.DiagnosticController.echoIndexedObject(DiagnosticController.java:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dataone.cn.rest.PortalCertificateFilter.doFilter(PortalCertificateFilter.java:82)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dataone.cn.rest.ServiceDisableFilter.doFilter(ServiceDisableFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[ INFO] 2017-12-14 21:32:43,610 (TripleStoreService:getDataset:33) TripleStoreService.getDataset - the parent directory of the triple store location ================= is /var/dataone/index/tdb
[ INFO] 2017-12-14 21:32:43,610 (TripleStoreService:getDataset:42) TripleStoreService.getDataset - the store directory ================= is /var/dataone/index/tdb/tdb32606505037282655
[DEBUG] 2017-12-14 21:32:43,611 (CommonsFileUploadSupport:cleanupFileItems:282) Cleaning up multipart file [sysmeta] with original filename [mmp.output.2570058838097726727.tmp], stored in memory
[DEBUG] 2017-12-14 21:32:43,611 (CommonsFileUploadSupport:cleanupFileItems:282) Cleaning up multipart file [object] with original filename [mmp.output.1047975705502505412.tmp], stored in memory
[ INFO] 2017-12-14 21:32:43,614 (ExceptionController:get:67) received error of 500
#4 Updated by Rob Nahf almost 7 years ago
there's no null check on the constructor parameter in d1_index_common's IndexTask:
public IndexTask(SystemMetadata smd, String objectPath) {
this();
if (smd.getIdentifier() != null) {
this.pid = smd.getIdentifier().getValue();
}
neither is there a check in ResourceMapSubprocessor.clearSidChain on whether or not the system Metadata is in the HzMap:
while (pidToProcess != null) {
// queue a delete processing of all versions in the SID chain
SystemMetadata sysmeta = HazelcastClientFactory.getSystemMetadataMap().get(
pidToProcess);
String objectPath = HazelcastClientFactory.getObjectPathMap().get(pidToProcess);
logger.debug("Removing pidToProcess===" + pidToProcess.getValue());
logger.debug("Removing objectPath===" + objectPath);
IndexTask task = new IndexTask(sysmeta, objectPath); try { deleteProcessor.process(task); } catch (Exception e) { logger.error(e.getMessage(), e); } // go to the next one pidToProcess = sysmeta.getObsoletes(); } }
Code in trunk fixes this, so copy from there into the 2.3 branch.
#5 Updated by Rob Nahf almost 7 years ago
- Category set to d1_indexer
- Status changed from New to Closed
- % Done changed from 0 to 100
committed fix in 2.3 branch (already fixed in trunk)
#6 Updated by Dave Vieglais almost 7 years ago
- Sprint set to CCI-2.3.7