Project

General

Profile

Story #8540

interesting 15 minute Hazelcast timeout in IndexProcessor

Added by Rob Nahf over 3 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
d1_indexer
Target version:
-
Start date:
2018-04-12
Due date:
% Done:

0%

Story Points:

Description

while processing some test objects in sandbox, the indexer stalled (multiple threads) when trying to read systemMEtadata from Hazelcast. The stall resulted 15 minutes of no activity. I don't know if the connection was reset, but activity started as normal after 15 minutes.

Lines from the logs:

[ INFO] 2018-04-12 02:05:31,961 [pool-1-thread-1] (IndexTaskProcessor:processTask:288) *********************end to process update index task for 11333/1906135 in thread 18
[ INFO] 2018-04-12 02:05:31,963 [pool-1-thread-1] (IndexTaskProcessor:processTask:315) Indexing complete for pid: 11333/1906135
[ INFO] 2018-04-12 02:05:43,450 [QuartzScheduler_Worker-2] (HTTPService:doRequest:381) REQUEST URI= http://localhost:8983/solr/search_core/select/?q=seriesId%3A11333%2F1360307+
OR+seriesId%3A11333%2F1360309+OR+seriesId%3A11333%2F1360295+OR+seriesId%3A11333%2F1360315+OR+seriesId%3A11333%2F1360313+OR+seriesId%3A11333%2F1360323+OR+seriesId%3A11333%2F1360
297+OR+seriesId%3A11333%2F1360311+OR+seriesId%3A11333%2F1360327+OR+seriesId%3A11333%2F1360337+OR+seriesId%3A11333%2F1360325+OR+seriesId%3A11333%2F1360319+OR+seriesId%3A11333%2F
1360331+OR+seriesId%3A11333%2F1360303+OR+seriesId%3A11333%2F1360329+OR+seriesId%3A11333%2F1360339+OR+seriesId%3A11333%2F1360335+OR+seriesId%3A11333%2F1360321+OR+seriesId%3A1133
3%2F1360343+OR+seriesId%3A11333%2F1360305+OR+seriesId%3A11333%2F1360341+OR+seriesId%3A11333%2F1360317+OR+seriesId%3A11333%2F1360301+OR+seriesId%3A11333%2F1906114+OR+seriesId%3A
11333%2F1360333&start=0&rows=25&indent=on
[ INFO] 2018-04-12 02:05:43,473 [QuartzScheduler_Worker-2] (HTTPService:doRequest:381) REQUEST URI= http://localhost:8983/solr/search_core/select/?q=id%3A11333%2F1360307+OR+id%
3A11333%2F1360309+OR+id%3A11333%2F1360295+OR+id%3A11333%2F1360315+OR+id%3A11333%2F1360313+OR+id%3A11333%2F1360323+OR+id%3A11333%2F1360297+OR+id%3A11333%2F1360311+OR+id%3A11333%
2F1360327+OR+id%3A11333%2F1360337+OR+id%3A11333%2F1360325+OR+id%3A11333%2F1360319+OR+id%3A11333%2F1360331+OR+id%3A11333%2F1360303+OR+id%3A11333%2F1360329+OR+id%3A11333%2F136033
9+OR+id%3A11333%2F1360335+OR+id%3A11333%2F1360321+OR+id%3A11333%2F1360343+OR+id%3A11333%2F1360305+OR+id%3A11333%2F1360341+OR+id%3A11333%2F1360317+OR+id%3A11333%2F1360301+OR+id%
3A11333%2F1906114+OR+id%3A11333%2F1360333&start=0&rows=25&indent=on
[ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:664) the original index task - IndexTask [id=8118464, pid=11333/1906115, formati
d=http://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2016071422565971925.1, dateSysMetaModified=1405815300000, deleted=false, taskModifiedDate=15234987
09736, priority=3, status=IN PROCESS]
[ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:670) the new index task is a ResourceMapIndexTask
[ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:671) the new index task - IndexTask [id=8118464, pid=11333/1906115, formatid=htt
p://www.openarchives.org/ore/terms, objectPath=/var/metacat/data/autogen.2016071422565971925.1, dateSysMetaModified=1405815300000, deleted=false, taskModifiedDate=1523498709736
, priority=3, status=IN PROCESS]
[ INFO] 2018-04-12 02:05:43,482 [QuartzScheduler_Worker-2] (IndexTaskProcessor:processTaskOnThread:260) using multiple threads to process index and the size of the thread pool 
is 5
[DEBUG] 2018-04-12 02:05:43,483 [pool-1-thread-2] (IndexTaskProcessor:checkReadinessProcessResourceMap:328) $$$$$$$$$$$$$$$$$ the index task 11333/1906115 is a resource map tas
k in the the thread 19
[ INFO] 2018-04-12 02:05:43,485 [QuartzScheduler_Worker-2] (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task 11333/1906347
[ INFO] 2018-04-12 02:05:43,485 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:610) Start of indexing pid: 11333/1906347
[ERROR] 2018-04-12 02:14:31,411 [pool-1-thread-2] (IndexTaskProcessor:processTask:297) Unable to process task for pid: 11333/1906115
java.lang.RuntimeException: [CONCURRENT_MAP_GET] Operation Timeout (with no response!): 300000
    at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
    at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
    at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
    at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
    at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
    at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
    at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
    at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
    at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
    at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
    at com.hazelcast.client.MapClientProxy.get(MapClientProxy.java:233)
    at org.dataone.cn.indexer.parser.utility.SeriesIdResolver.isSeriesId(SeriesIdResolver.java:53)
    at org.dataone.cn.index.processor.IndexTaskProcessor.checkReadinessProcessResourceMap(IndexTaskProcessor.java:335)
    at org.dataone.cn.index.processor.IndexTaskProcessor.processTask(IndexTaskProcessor.java:276)
    at org.dataone.cn.index.processor.IndexTaskProcessor.access$000(IndexTaskProcessor.java:80)
    at org.dataone.cn.index.processor.IndexTaskProcessor$1.run(IndexTaskProcessor.java:263)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
[ INFO] 2018-04-12 02:14:31,414 [pool-1-thread-2] (IndexTaskProcessor:newOrFailedIndexTaskExists:890) IndexTaskProcess.newOrFailedIndexTaskExists for id 11333/1906115
[ERROR] 2018-04-12 02:14:31,547 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:645) Unable to parse ORE doc: 11333/1906347.  Unrecoverable parse error: task wi
ll not be re-tried.
[ INFO] 2018-04-12 02:14:31,547 [QuartzScheduler_Worker-2] (IndexTask:markNew:454) Even tough it was masked new, it is still considered failed for id 11333/1906347 since it was
 tried to many times.
[ INFO] 2018-04-12 02:14:31,547 [QuartzScheduler_Worker-2] (IndexTaskProcessor:newOrFailedIndexTaskExists:890) IndexTaskProcess.newOrFailedIndexTaskExists for id 11333/1906347
[ INFO] 2018-04-12 02:14:31,598 [QuartzScheduler_Worker-2] (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task 11333/1906347
[ INFO] 2018-04-12 02:14:31,598 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:660) Task for resource map pid: 11333/1906347 not processed.
[ INFO] 2018-04-12 02:14:31,600 [QuartzScheduler_Worker-2] (IndexTaskProcessor:saveTask:865) IndexTaskProcess.saveTask save the index task 11333/1906174
[ INFO] 2018-04-12 02:14:31,600 [QuartzScheduler_Worker-2] (IndexTaskProcessor:getNextIndexTask:610) Start of indexing pid: 11333/1906174
[ WARN] 2018-04-12 02:14:31,644 [pool-1-thread-5] (SolrIndexService:processObject:241) The subprocessor org.dataone.cn.indexer.parser.ResourceMapSubprocessor can't process the 
id 11333/1905797 since [CONCURRENT_MAP_GET] Operation Timeout (with no response!): 300000. However, the index still can be achieved without this part of information provided by
 the processor.
java.lang.RuntimeException: [CONCURRENT_MAP_GET] Operation Timeout (with no response!): 300000
    at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
    at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
    at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
    at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
    at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
    at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
    at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)

Not sure if there's anything actionable. 15 minutes seems like a long time to wait...

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)