Project

General

Profile

Bug #8183

CNDiagnostic.echoIndexedObject() fails on resource maps

Added by Chris Jones over 6 years ago. Updated over 6 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
d1_indexer
Target version:
Start date:
2017-09-12
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

Calling @CNDiagnostic.echoIndexedObject()@ with both known-good and known-bad resource maps fails with a @500 ServiceFailure@ exception. We would expect it to return the list of Solr documents for the known-good resource map, and the original exception content for known-bad resource maps.

Here are two examples:

Good (i.e. has been successfully indexed by DataONE):
https://knb.ecoinformatics.org/knb/d1/mn/v2/meta/resourceMap_knb.1389.1

Bad (failed indexing by DataONE):
https://knb.ecoinformatics.org/knb/d1/mn/v2/meta/urn:uuid:c3cb05ae-0fd7-4ebc-960a-4781521d249d

The latter ORE has a syntax error with regard to the @dc:creator@ content, and fails to be parsed by the Jena library with:

[ERROR] 2017-09-12 12:58:51,900 (RDFDefaultErrorHandler:error:40) (line 9 column 61): {E201} rdf:resource not allowed as attribute here.
[ERROR] 2017-09-12 12:58:51,910 (RDFDefaultErrorHandler:error:40) (line 10 column 70): {E201} Multiple children of property element
[ERROR] 2017-09-12 12:58:51,910 (RDFDefaultErrorHandler:error:40) (line 10 column 70): {E201} rdf:datatype not allowed as attribute here.
[ERROR] 2017-09-12 12:58:51,911 (RDFDefaultErrorHandler:error:40) (line 10 column 82): {E202} Expecting XML start or end element(s). String data "Ocean Spaces" not allowed. Maybe there should be an rdf:parseType='Literal' for embedding mixed XML content in RDF. Maybe a striping error.

So we would expect these errors to trickle up the call stack and be reported to the submitter.

Associated revisions

Revision 19049
Added by Rob Nahf over 6 years ago

fixes #8183. (in 2.3 branch) Added NPE check after Hz.getSystemMetadata in clearSidChain. echoIndexObject often doesn't have predeccessor documents that are synchronized, so we cannot rely on the Hz sysmeta map to never return null.

Revision 19049
Added by Rob Nahf over 6 years ago

fixes #8183. (in 2.3 branch) Added NPE check after Hz.getSystemMetadata in clearSidChain. echoIndexObject often doesn't have predeccessor documents that are synchronized, so we cannot rely on the Hz sysmeta map to never return null.

History

#1 Updated by Dave Vieglais over 6 years ago

  • Assignee set to Jing Tao
  • Target version changed from CCI-2.3.5 to CCI-2.3.7

#2 Updated by Dave Vieglais over 6 years ago

  • Assignee changed from Jing Tao to Rob Nahf

#3 Updated by Rob Nahf over 6 years ago

created a new functional integration test specific for resourcemaps (existing test was only scimetadata). ran on cn-sandbox-unm-1 with the cn.war log level bumped to debug. Stacktrace shows an unhandled NPE while trying to clear the SID chain:

[DEBUG] 2017-12-14 21:32:43,036 (CommonsFileUploadSupport:parseFileItems:259) Found multipart file [sysmeta] of size 829 bytes with original filename [mmp.output.2570058838097726727.tmp], stored in memory
[DEBUG] 2017-12-14 21:32:43,037 (CommonsFileUploadSupport:parseFileItems:259) Found multipart file [object] of size 3760 bytes with original filename [mmp.output.1047975705502505412.tmp], stored in memory
[ INFO] 2017-12-14 21:32:43,038 (DiagnosticController:echoIndexedObject:196) Found filepart sysmeta
[ INFO] 2017-12-14 21:32:43,041 (DiagnosticController:echoIndexedObject:196) Found filepart object
[DEBUG] 2017-12-14 21:32:43,070 (ResourceMapFactory:parseResourceMap:458) https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.package.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,072 (ResourceMapFactory:parseResourceMap:490) Agg resource: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.2.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,073 (ResourceMapFactory:parseResourceMap:493) --- documents count: 0
[DEBUG] 2017-12-14 21:32:43,073 (ResourceMapFactory:parseResourceMap:512) +++ isDocBy count: 1
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:516) +++docBySubjectURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.2.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:519) +++metadataURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:522) Creating new HashSet for: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946 : 0
[DEBUG] 2017-12-14 21:32:43,074 (ResourceMapFactory:parseResourceMap:490) Agg resource: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.1.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,075 (ResourceMapFactory:parseResourceMap:493) --- documents count: 0
[DEBUG] 2017-12-14 21:32:43,075 (ResourceMapFactory:parseResourceMap:512) +++ isDocBy count: 1
[DEBUG] 2017-12-14 21:32:43,075 (ResourceMapFactory:parseResourceMap:516) +++docBySubjectURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.1.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,076 (ResourceMapFactory:parseResourceMap:519) +++metadataURI: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,076 (ResourceMapFactory:parseResourceMap:490) Agg resource: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:493) --- documents count: 2
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:498) ---metadataURI : https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:505) ---documentsObject: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.2.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,077 (ResourceMapFactory:parseResourceMap:505) ---documentsObject: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scidata.1.2012120601491354790946
[DEBUG] 2017-12-14 21:32:43,078 (ResourceMapFactory:parseResourceMap:512) +++ isDocBy count: 0
[DEBUG] 2017-12-14 21:32:43,078 (ResourceMapFactory:parseResourceMap:532) ~~~~~ data count: https://cn-dev.test.dataone.org/cn/v1/resolve/r_test3.scimeta.2012120601491354790946: 2
[DEBUG] 2017-12-14 21:32:43,175 (HttpConnectionMonitorService:run:103) ...calling closeExpire/IdleConnections...
[DEBUG] 2017-12-14 21:32:43,608 (ResourceMapSubprocessor:clearSidChain:192) Removing pidToProcess===testEchoSystemMetadata_2017348133241216
[DEBUG] 2017-12-14 21:32:43,608 (ResourceMapSubprocessor:clearSidChain:193) Removing objectPath===null
[ WARN] 2017-12-14 21:32:43,609 (SolrIndexService:processObject:241) The subprocessor org.dataone.cn.indexer.parser.ResourceMapSubprocessor can't process the id testEchoSystemMetadata_2017348133241216 since null. However, the index still can be achieved without this part of information provided by the processor.
java.lang.NullPointerException
at org.dataone.cn.index.task.IndexTask.(IndexTask.java:192)
at org.dataone.cn.indexer.parser.ResourceMapSubprocessor.clearSidChain(ResourceMapSubprocessor.java:195)
at org.dataone.cn.indexer.parser.ResourceMapSubprocessor.processResourceMap(ResourceMapSubprocessor.java:152)
at org.dataone.cn.indexer.parser.ResourceMapSubprocessor.processDocument(ResourceMapSubprocessor.java:121)
at org.dataone.cn.indexer.SolrIndexService.processObject(SolrIndexService.java:230)
at org.dataone.cn.rest.v2.DiagnosticController.echoIndexedObject(DiagnosticController.java:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:923)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:852)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:882)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:789)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:646)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dataone.cn.rest.PortalCertificateFilter.doFilter(PortalCertificateFilter.java:82)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.dataone.cn.rest.ServiceDisableFilter.doFilter(ServiceDisableFilter.java:78)
at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)
at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)
at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220)
at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122)
at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:501)
at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:170)
at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116)
at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408)
at org.apache.coyote.ajp.AjpProcessor.process(AjpProcessor.java:193)
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:607)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:313)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[ INFO] 2017-12-14 21:32:43,610 (TripleStoreService:getDataset:33) TripleStoreService.getDataset - the parent directory of the triple store location ================= is /var/dataone/index/tdb
[ INFO] 2017-12-14 21:32:43,610 (TripleStoreService:getDataset:42) TripleStoreService.getDataset - the store directory ================= is /var/dataone/index/tdb/tdb32606505037282655
[DEBUG] 2017-12-14 21:32:43,611 (CommonsFileUploadSupport:cleanupFileItems:282) Cleaning up multipart file [sysmeta] with original filename [mmp.output.2570058838097726727.tmp], stored in memory
[DEBUG] 2017-12-14 21:32:43,611 (CommonsFileUploadSupport:cleanupFileItems:282) Cleaning up multipart file [object] with original filename [mmp.output.1047975705502505412.tmp], stored in memory
[ INFO] 2017-12-14 21:32:43,614 (ExceptionController:get:67) received error of 500

#4 Updated by Rob Nahf over 6 years ago

there's no null check on the constructor parameter in d1_index_common's IndexTask:

public IndexTask(SystemMetadata smd, String objectPath) {
this();
if (smd.getIdentifier() != null) {
this.pid = smd.getIdentifier().getValue();
}

neither is there a check in ResourceMapSubprocessor.clearSidChain on whether or not the system Metadata is in the HzMap:

while (pidToProcess != null) {
// queue a delete processing of all versions in the SID chain
SystemMetadata sysmeta = HazelcastClientFactory.getSystemMetadataMap().get(
pidToProcess);
String objectPath = HazelcastClientFactory.getObjectPathMap().get(pidToProcess);
logger.debug("Removing pidToProcess===" + pidToProcess.getValue());
logger.debug("Removing objectPath===" + objectPath);

            IndexTask task = new IndexTask(sysmeta, objectPath);
            try {
                deleteProcessor.process(task);
            } catch (Exception e) {
                logger.error(e.getMessage(), e);
            }
            // go to the next one
            pidToProcess = sysmeta.getObsoletes();
        }
    }

Code in trunk fixes this, so copy from there into the 2.3 branch.

#5 Updated by Rob Nahf over 6 years ago

  • Category set to d1_indexer
  • Status changed from New to Closed
  • % Done changed from 0 to 100

committed fix in 2.3 branch (already fixed in trunk)

#6 Updated by Dave Vieglais over 6 years ago

  • Sprint set to CCI-2.3.7

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)