Task #8179
Story #8175: Solr crashing with out of memory error
review log aggregation code for memory leaks and efficiency
100%
Description
Solr memory use on the CN running log aggregation always appears higher than on the other CNs.
This suggests that log aggregation imposes a high load on the Solr index or there may be some inefficiencies of memory leaks in the log aggregation service.
The goal of this task is to review the log aggregation implementation and correct where necessary.
Associated revisions
refs: #8179. in trunk, fixed resource leak in LogAggretationScheduleManager.init() method.
refs: #8179. in trunk, fixed resource leak in LogAggretationScheduleManager.init() method.
refs: #8179. in trunk, fixed resource leak in LogHarvesterTask.harvest method that was not closing FileReaders in a finally block. The CSVParser is Autoclosable, so in theory under regular operation, resources were getting closed, and so resource leaks only would be happening when exceptions thrown. Refactored to use the try-with-resources construct.
refs: #8179. in trunk, fixed resource leak in LogHarvesterTask.harvest method that was not closing FileReaders in a finally block. The CSVParser is Autoclosable, so in theory under regular operation, resources were getting closed, and so resource leaks only would be happening when exceptions thrown. Refactored to use the try-with-resources construct.
History
#1 Updated by Rob Nahf about 7 years ago
- Category changed from d1_indexer to d1_log_aggregation
#2 Updated by Rob Nahf about 7 years ago
see https://wiki.apache.org/solr/SolrPerformanceProblems for point of reference. It gives a good overview of potential problem areas.
There's an interesting point about potential problems with the zookeeper database on the same disk as the solr database. This might be a pathway for one solr index (query index) to indirectly impact another (the log index).
Given multiple causes, it would be good to note specific issues and symptoms.
#3 Updated by Rob Nahf about 7 years ago
Not directly related to memory, but looked at the SolrClientManager code that submits the updates. The submitBeans call wraps the apache HttpSolrClient.addBeans method. The logs show decent update speed.
[ INFO] 2017-11-28 18:00:45,889 (SolrClientManager:submitBeans:42) SolrClientManager-urn:node:mnSandboxUCSB2 submitting 500 records to index
[DEBUG] 2017-11-28 18:00:45,889 (SolrClientManager:submitBeans:45) SolrClientManager-urn:node:mnSandboxUCSB2 locked
[ INFO] 2017-11-28 18:00:47,406 (SolrClientManager:submitBeans:50) SolrClientManager-urn:node:mnSandboxUCSB2 Completed submission
[DEBUG] 2017-11-28 18:00:47,406 (SolrClientManager:submitBeans:53) SolrClientManager-urn:node:mnSandboxUCSB2 unlocked
[DEBUG] 2017-11-28 18:00:47,407 (LogHarvesterTask:harvest:430) LogHarvesterTask-urn:node:mnSandboxUCSB2 publishing 500 items to Solr
[ INFO] 2017-11-28 18:00:47,407 (SolrClientManager:submitBeans:42) SolrClientManager-urn:node:mnSandboxUCSB2 submitting 500 records to index
[DEBUG] 2017-11-28 18:00:47,407 (SolrClientManager:submitBeans:45) SolrClientManager-urn:node:mnSandboxUCSB2 locked
[ INFO] 2017-11-28 18:00:48,563 (SolrClientManager:submitBeans:50) SolrClientManager-urn:node:mnSandboxUCSB2 Completed submission
[DEBUG] 2017-11-28 18:00:48,563 (SolrClientManager:submitBeans:53) SolrClientManager-urn:node:mnSandboxUCSB2 unlocked
[DEBUG] 2017-11-28 18:00:48,563 (LogHarvesterTask:harvest:430) LogHarvesterTask-urn:node:mnSandboxUCSB2 publishing 500 items to Solr
[ INFO] 2017-11-28 18:00:48,564 (SolrClientManager:submitBeans:42) SolrClientManager-urn:node:mnSandboxUCSB2 submitting 500 records to index
[DEBUG] 2017-11-28 18:00:48,564 (SolrClientManager:submitBeans:45) SolrClientManager-urn:node:mnSandboxUCSB2 locked
[ INFO] 2017-11-28 18:00:49,523 (SolrClientManager:submitBeans:50) SolrClientManager-urn:node:mnSandboxUCSB2 Completed submission
[DEBUG] 2017-11-28 18:00:49,523 (SolrClientManager:submitBeans:53) SolrClientManager-urn:node:mnSandboxUCSB2 unlocked
[DEBUG] 2017-11-28 18:00:49,523 (LogHarvesterTask:harvest:430) LogHarvesterTask-urn:node:mnSandboxUCSB2 publishing 133 items to Solr
[ INFO] 2017-11-28 18:00:49,524 (SolrClientManager:submitBeans:42) SolrClientManager-urn:node:mnSandboxUCSB2 submitting 133 records to index
[DEBUG] 2017-11-28 18:00:49,524 (SolrClientManager:submitBeans:45) SolrClientManager-urn:node:mnSandboxUCSB2 locked
[ INFO] 2017-11-28 18:00:49,876 (SolrClientManager:submitBeans:50) SolrClientManager-urn:node:mnSandboxUCSB2 Completed submission
[DEBUG] 2017-11-28 18:00:49,876 (SolrClientManager:submitBeans:53) SolrClientManager-urn:node:mnSandboxUCSB2 unlocked
[ INFO] 2017-11-28 18:00:49,881 (LogHarvesterTask:harvest:470) LogHarvesterTask-urn:node:mnSandboxUCSB2 Latested Harvested Log Entry Date 2017-11-27T23:48:51.023+0000
[ INFO] 2017-11-28 18:00:49,881 (LogHarvesterTask:harvest:474) LogHarvesterTask-urn:node:mnSandboxUCSB2 Complete¶
[ INFO] 2017-11-28 18:02:33,169 (SolrClientManager:submitBeans:42) SolrClientManager-urn:node:mnSandboxORC1 submitting 17 records to index
[DEBUG] 2017-11-28 18:02:33,169 (SolrClientManager:submitBeans:45) SolrClientManager-urn:node:mnSandboxORC1 locked
[ INFO] 2017-11-28 18:02:33,437 (SolrClientManager:submitBeans:50) SolrClientManager-urn:node:mnSandboxORC1 Completed submission
[DEBUG] 2017-11-28 18:02:33,437 (SolrClientManager:submitBeans:53) SolrClientManager-urn:node:mnSandboxORC1 unlocked
[ INFO] 2017-11-28 18:02:33,443 (LogHarvesterTask:harvest:470) LogHarvesterTask-urn:node:mnSandboxORC1 Latested Harvested Log Entry Date 2017-11-27T23:46:19.679+0000
[ INFO] 2017-11-28 18:02:33,443 (LogHarvesterTask:harvest:474) LogHarvesterTask-urn:node:mnSandboxORC1 Complete
depending on the number of beans submitted to update, it averages 2-3ms per bean. For smaller commits, it's less efficient, so there's likely some fixed-time overhead in the commit.
#4 Updated by Rob Nahf about 7 years ago
- Status changed from New to In Progress
- % Done changed from 0 to 30
found one resource leak - an unclosed inputStream that probably is inconsequential as it likely occurs only once at LogAggregationScheduleManager init.
#5 Updated by Rob Nahf about 7 years ago
found some more potential resource leaks in LogHarvesterTask.harvest method. The FileReaders were not being closed within a finally block. switched to using try-with-resources construct to be more legible.
#6 Updated by Dave Vieglais almost 6 years ago
- % Done changed from 30 to 100
- Status changed from In Progress to Closed