Project

General

Profile

Bug #3264

Production data problems after 1.0.4 release

Added by Robert Waltz over 11 years ago. Updated about 7 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Environment.Production
Target version:
-
Start date:
2012-09-26
Due date:
% Done:

100%

Milestone:
None
Product Version:
Story Points:
Sprint:

Description

There are documents in in xml_documents on UCSB that only show up in xml_revisions on ORC and UNM.

There are documents in in xml_documents on ORC and UNM that only show up in xml_revisions on UCSB.

see the attached file

ErrorsOnProd20120926.txt Magnifier (4.06 KB) Robert Waltz, 2012-09-26 16:49

History

#1 Updated by Robert Waltz over 11 years ago

  • Target version changed from Sprint-2012.37-Block.5.3 to Sprint-2012.39-Block.5.4

#2 Updated by Chris Jones over 11 years ago

  • Status changed from New to In Progress
  • Target version deleted (Sprint-2012.39-Block.5.4)
  • Milestone changed from CCI-1.1 to None

The symptoms described here are not the expected behavior across Metacat's that perform two-way replication. In fact, one would expect that the xml_documents tables and xml_revisions tables on each Coordinating Node would stay synchronized over time. However, this issue is minor in that, for CNs, all operations take place on a pid rather than a docid, and therefore inconsistencies between documents found in xml_documents and those found in xml_revisions are functionally not a problem whatsoever.

That said, we still need to investigate why this is happening. The only code in Metacat that transfers documents from xml_documents into xml_revisions are calls like update() and delete(). These methods fire off ForceReplicationHandler events to subsequently update remot Metacats. These replication events may be failing do to timeouts or some other issue, and I'm seeing many types of ERROR messages in /var/metacat/logs/replicate.log* having to do with calls in ForceReplicationHandler. Each class of error probably needs to be tracked down to understand why we're seeing inconsistencies across tables. Many may be temporary, and so getting a grasp on this will take a while.

Overall, the differences are limited to a few dozen docids. Given that, and the fact that there's no functional problem on the CN, I'm setting this to low priority for now and will put it on the back burner. It will be good to track down the cause though.

#3 Updated by Dave Vieglais about 7 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 0 to 100

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)