Story #8540
interesting 15 minute Hazelcast timeout in IndexProcessor
Start date:
2018-04-12
Due date:
% Done:
0%
Story Points:
Sprint:
Description
while processing some test objects in sandbox, the indexer stalled (multiple threads) when trying to read systemMEtadata from Hazelcast. The stall resulted 15 minutes of no activity. I don't know if the connection was reset, but activity started as normal after 15 minutes.
Lines from the logs:
[ INFO] 2018-04-12 02:05:31,961 [pool-1-thread-1] (IndexTaskProcessor:processTask:288) *********************end to process update index task for 11333/1906135 in thread 18 [ INFO] 2018-04-12 02:05:31,963 [pool-1-thread-1] (IndexTaskProcessor:processTask:315) Indexing complete for pid: 11333/1906135 [ INFO] 2018-04-12 02:05:43,450 [QuartzScheduler_Worker-2] (HTTPService:doRequest:381) REQUEST URI= http://localhost:8983/solr/search_core/select/?q=seriesId%3A11333%2F1360307+ OR+seriesId%3A11333%2F1360309+OR+seriesId%3A11333%2F1360295+OR+seriesId%3A11333%2F1360315+OR+seriesId%3A11333%2F1360313+OR+seriesId%3A11333%2F1360323+OR+seriesId%3A11333%2F1360 297+OR+seriesId%3A11333%2F1360311+OR+seriesId%3A11333%2F1360327+OR+seriesId%3A11333%2F1360337+OR+seriesId%3A11333%2F1360325+OR+seriesId%3A11333%2F1360319+OR+seriesId%3A11333%2F 1360331+OR+seriesId%3A11333%2F1360303+OR+seriesId%3A11333%2F1360329+OR+seriesId%3A11333%2F1360339+OR+seriesId%3A11333%2F1360335+OR+seriesId%3A11333%2F1360321+OR+seriesId%3A1133 3%2F1360343+OR+seriesId%3A11333%2F1360305+OR+seriesId%3A11333%2F1360341+OR+seriesId%3A11333%2F1360317+OR+seriesId%3A11333%2F1360301+OR+seriesId%3A11333%2F1906114+OR+seriesId%3A 11333%2F1360333&start=0&rows=25&indent=on [ INFO] 2018-04-12 02:05:43,473 [QuartzScheduler_Worker-2] (HTTPService:doRequest:381) REQUEST URI= http://localhost:8983/solr/search_core/select/?q=id%3A11333%2F1360307+OR+id% 3A11333%2F1360309+OR+id%3A11333%2F1360295+OR+id%3A11333%2F1360315+OR+id%3A11333%2F1360313+OR+id%3A11333%2F1360323+OR+id%3A11333%2F1360297+OR+id%3A11333%2F1360311+OR+id%3A11333% 2F1360327+OR+id%3A11333%2F1360337+OR+id%3A11333%2F1360325+OR+id%3A11333%2F1360319+OR+id%3A11333%2F1360331+OR+id%3A11333%2F1360303+OR+id%3A11333%2F1360329+OR+id%3A11333%2F136033 9+OR+id%3A11333%2F1360335+OR+id%3A11333%2F1360321+OR+id%3A11333%2F1360343+OR+id%3A11333%2F1360305+OR+id%3A11333%2F1360341+OR+id%3A11333%2F1360317+OR+id%3A11333%2F1360301+OR+id% 3A11333%2F1906114+OR+id%3A11333%2F1360333&start=0&rows=25&indent=on [ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:664) the original index task - IndexTask [id=8118464, pid=11333/1906115, formati d=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2016071422565971925.1, dateSysMetaModified=1405815300000, deleted=false, taskModifiedDate=15234987 09736, priority=3, status=IN PROCESS] [ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:670) the new index task is a ResourceMapIndexTask [ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:671) the new index task - IndexTask [id=8118464, pid=11333/1906115, formatid=htt p://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2016071422565971925.1, dateSysMetaModified=1405815300000, deleted=false, taskModifiedDate=1523498709736 , priority=3, status=IN PROCESS] [ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:processTaskOnThread:260) using multiple threads to process index and the size of the thread pool is 5 [DEBUG] 2018-04-12 02:05:43,483 [pool-1-thread-2] (IndexTaskProcessor:checkReadinessProcessResourceMap:328) $$$$$$$$$$$$$$$$$ the index task 11333/1906115 is a resource map tas k in the the thread 19 [ INFO] 2018-04-12 02:05:43,485 [QuartzScheduler_Worker-2] (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task 11333/1906347 [ INFO] 2018-04-12 02:05:43,485 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:610) Start of indexing pid: 11333/1906347 [ERROR] 2018-04-12 02:14:31,411 [pool-1-thread-2] (IndexTaskProcessor:processTask:297) Unable to process task for pid: 11333/1906115 java.lang.RuntimeException: [CONCURRENT_MAP_GET] Operation Timeout (with no response!): 300000 at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63) at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104) at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79) at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121) at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156) at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72) at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34) at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186) at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146) at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140) at com.hazelcast.client.MapClientProxy.get(MapClientProxy.java:233) at org.dataone.cn.indexer.parser.utility.SeriesIdResolver.isSeriesId(SeriesIdResolver.java:53) at org.dataone.cn.index.processor.IndexTaskProcessor.checkReadinessProcessResourceMap(IndexTaskProcessor.java:335) at org.dataone.cn.index.processor.IndexTaskProcessor.processTask(IndexTaskProcessor.java:276) at org.dataone.cn.index.processor.IndexTaskProcessor.access$000(IndexTaskProcessor.java:80) at org.dataone.cn.index.processor.IndexTaskProcessor$1.run(IndexTaskProcessor.java:263) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [ INFO] 2018-04-12 02:14:31,414 [pool-1-thread-2] (IndexTaskProcessor:newOrFailedIndexTaskExists:890) IndexTaskProcess.newOrFailedIndexTaskExists for id 11333/1906115 [ERROR] 2018-04-12 02:14:31,547 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:645) Unable to parse ORE doc: 11333/1906347. Unrecoverable parse error: task wi ll not be re-tried. [ INFO] 2018-04-12 02:14:31,547 [QuartzScheduler_Worker-2] (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id 11333/1906347 since it was tried to many times. [ INFO] 2018-04-12 02:14:31,547 [QuartzScheduler_Worker-2] (IndexTaskProcessor:newOrFailedIndexTaskExists:890) IndexTaskProcess.newOrFailedIndexTaskExists for id 11333/1906347 [ INFO] 2018-04-12 02:14:31,598 [QuartzScheduler_Worker-2] (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task 11333/1906347 [ INFO] 2018-04-12 02:14:31,598 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:660) Task for resource map pid: 11333/1906347 not processed. [ INFO] 2018-04-12 02:14:31,600 [QuartzScheduler_Worker-2] (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task 11333/1906174 [ INFO] 2018-04-12 02:14:31,600 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:610) Start of indexing pid: 11333/1906174 [ WARN] 2018-04-12 02:14:31,644 [pool-1-thread-5] (SolrIndexService:processObject:241) The subprocessor org.dataone.cn.indexer.parser.ResourceMapSubprocessor can't process the id 11333/1905797 since [CONCURRENT_MAP_GET] Operation Timeout (with no response!): 300000. However, the index still can be achieved without this part of information provided by the processor. java.lang.RuntimeException: [CONCURRENT_MAP_GET] Operation Timeout (with no response!): 300000 at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63) at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104) at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79) at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121) at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156) at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72) at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
Not sure if there's anything actionable. 15 minutes seems like a long time to wait...