Bug #8106
Difficulty Distinguishing Between Different Kinds of Failed Sync "Not Found" Error Messages
50%
Description
Attached are two examples of parsed log files describing PIDs which failed to sync. The log files contain excerpts from different areas of logs including Metacat and d1-synchronization. Of particular interest is the Synchronization logs which in both cases report "Not Found", but in response to different conditions. In the one case, the sync failed because the CN was given a manual synchronization request but passed a PID that it could not locate on the MN (the node operator had attempted to escape PID characters and it was unneeded in this case). In the second case, the CN recognized a legitimate PID but the bytes of the object were missing.
Some differentiation in the error messages could be more helpful in identifying what went wrong during a sync.
Subtasks
Associated revisions
refs #8106: (copied from trunk): added cause to the UnrecoverableException thrown when ServiceFailure from cn.create. This is important (we believe) to get the real cause in the SynchronizationFailure sent back to the MN. Also adding the description of cause for sync failed to description of UnrecoverableException.
refs #8106: (copied from trunk): added cause to the UnrecoverableException thrown when ServiceFailure from cn.create. This is important (we believe) to get the real cause in the SynchronizationFailure sent back to the MN. Also adding the description of cause for sync failed to description of UnrecoverableException.
History
#1 Updated by Rob Nahf over 7 years ago
- File improperly.html added
- Status changed from New to In Progress
- % Done changed from 0 to 30
Log snippets difficult to work through, so I scripted a reparser to chronologically sort, and interleave log entries, and filter by log level. (See attached. It's html, so open with browser).
In the first file, there's something wrong with the timestamps from Metacat - those entries are June 17, instead of May 17th. (today is June 6th).
#2 Updated by Dave Vieglais over 7 years ago
- Target version set to CCI-2.3.6
#3 Updated by Rob Nahf over 7 years ago
found mistake in one of the logging statements that didn't include the exception as a cause when wrapping in UnrecoverableException. This could have caused the message in the sync failed not to be populated. Also added the exception name to another logging statement for a generic exception catch.
Deployed in CCI 2.3.5 (d1_synchornization 2.3.4)
#4 Updated by Jing Tao about 7 years ago
- Target version changed from CCI-2.3.6 to CCI-2.3.7
#5 Updated by Dave Vieglais almost 7 years ago
- Sprint set to CCI-2.3.7
#6 Updated by Rob Nahf almost 7 years ago
- Status changed from In Progress to Testing
- % Done changed from 30 to 50