Task #3407
Fix indexing conflicts on CNs that cause Metacat systemmetadata tables get out of sync
100%
Description
During testing of d1_replication, we've seen the smreplicationstatus tables get out of sync across CNs. For instance, two CNs may have a status of COMPLETED for a replica, and the third CN still says REQUESTED. This looks to coincide with SQL errors showing a foreign key violation in the xml_documents table during calls to CNodeService.archive(). After chatting with Ben, the current thought is that, due to the asynchronous nature of the indexing process in Metacat to populate xml_index is causing the issue. When an object gets created, the index process is queued, but the object may quickly get archived before the indexing is complete. Or, the indexing may complete, causing the archive() method to through a SQL exception because the reference to the docid is still in the xml_index table as a foreign key.
These issues may be alleviated if we don't use Metacat's indexing feature on the CNs. Determine if we can set a flag to not push docids into the IndexerQueue. Test this new Metacat code in the sandbox environment with high-transaction rates that frequently call the archive() method.
Ben, Matt - thoughts on the consequeunces of turning off Metacat indexing on the CNs?
History
#1 Updated by Chris Jones about 12 years ago
- Status changed from New to In Progress
Ben has changed the IndexingQueue handling in Metacat to remove indexing tasks when delete() and archive() have been called, and replication runs so far show no foreign key violation SQLExceptions.
#2 Updated by Chris Jones about 12 years ago
- Status changed from In Progress to Testing
#3 Updated by Chris Jones almost 12 years ago
- Status changed from Testing to Closed
- translation missing: en.field_remaining_hours set to 0.0
We've also changed calls to update (not just delete) systemmetadata to ensure that there are no outstanding indexing tasks in the queue that would cause a subsequent SQL exception. This has been tested in sandbox and stage, and we are not seeing the exceptions anymore.