Project

General

Profile

Bug #8106

Difficulty Distinguishing Between Different Kinds of Failed Sync "Not Found" Error Messages

Added by Monica Ihli almost 7 years ago. Updated about 6 years ago.

Status:
Testing
Priority:
Normal
Assignee:
Category:
d1_synchronization
Target version:
Start date:
2017-06-07
Due date:
% Done:

50%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

Attached are two examples of parsed log files describing PIDs which failed to sync. The log files contain excerpts from different areas of logs including Metacat and d1-synchronization. Of particular interest is the Synchronization logs which in both cases report "Not Found", but in response to different conditions. In the one case, the sync failed because the CN was given a manual synchronization request but passed a PID that it could not locate on the MN (the node operator had attempted to escape PID characters and it was unneeded in this case). In the second case, the CN recognized a legitimate PID but the bytes of the object were missing.

Some differentiation in the error messages could be more helpful in identifying what went wrong during a sync.

improperly_escaped_PID_2017-06-01t.txt Magnifier (179 KB) Monica Ihli, 2017-06-06 17:18

bytes_missing_22017-04-26.txt Magnifier (36.1 KB) Monica Ihli, 2017-06-06 17:18

improperly.html Magnifier - reparsed log summary derived from improperly_escaped... file. (176 KB) Rob Nahf, 2017-06-07 02:36


Subtasks

Task #8110: metacat.EventLog.getReport should downgrade a log.WARN to log.DEBUGClosedJing Tao

Task #8111: D1ResourceHandler.serializeException should not log all exceptions returned as errors.ClosedJing Tao

Associated revisions

Revision 18856
Added by Rob Nahf almost 7 years ago

refs #8106: (copied from trunk): added cause to the UnrecoverableException thrown when ServiceFailure from cn.create. This is important (we believe) to get the real cause in the SynchronizationFailure sent back to the MN. Also adding the description of cause for sync failed to description of UnrecoverableException.

Revision 18856
Added by Rob Nahf almost 7 years ago

refs #8106: (copied from trunk): added cause to the UnrecoverableException thrown when ServiceFailure from cn.create. This is important (we believe) to get the real cause in the SynchronizationFailure sent back to the MN. Also adding the description of cause for sync failed to description of UnrecoverableException.

History

#1 Updated by Rob Nahf almost 7 years ago

  • File improperly.htmlMagnifier added
  • Status changed from New to In Progress
  • % Done changed from 0 to 30

Log snippets difficult to work through, so I scripted a reparser to chronologically sort, and interleave log entries, and filter by log level. (See attached. It's html, so open with browser).

In the first file, there's something wrong with the timestamps from Metacat - those entries are June 17, instead of May 17th. (today is June 6th).

#2 Updated by Dave Vieglais almost 7 years ago

  • Target version set to CCI-2.3.6

#3 Updated by Rob Nahf almost 7 years ago

found mistake in one of the logging statements that didn't include the exception as a cause when wrapping in UnrecoverableException. This could have caused the message in the sync failed not to be populated. Also added the exception name to another logging statement for a generic exception catch.

Deployed in CCI 2.3.5 (d1_synchornization 2.3.4)

#4 Updated by Jing Tao over 6 years ago

  • Target version changed from CCI-2.3.6 to CCI-2.3.7

#5 Updated by Dave Vieglais over 6 years ago

  • Sprint set to CCI-2.3.7

#6 Updated by Rob Nahf about 6 years ago

  • Status changed from In Progress to Testing
  • % Done changed from 30 to 50

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)