Project

General

Profile

Bug #3874

Inconsistent object counts across CNs

Added by Dave Vieglais over 11 years ago. Updated over 10 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Robert Waltz
Category:
Environment.Production
Target version:
Start date:
2013-07-21
Due date:
2014-04-12
% Done:

100%

Milestone:
CCI-1.2
Product Version:
*
Story Points:
Sprint:

Description

As of this writing, CN-UNM-1 reports 345897 objects, whereas CN-ORC-1 and CN-UCSB-1 report 345922 objects.

Goal of this ticket is to identify how the object count came to be different, and what corrective measures should be taken.

May be related to #3352 ?


Related issues

Related to Infrastructure - Story #3736: CN Consistency Check and CN Recovery Rejected

History

#1 Updated by Dave Vieglais over 11 years ago

SOLR counts are off as well, but different on each server:

$ for node in unm ucsb orc; do curl -s "https://cn-$node-1.dataone.org/cn/v1/query/solr/?q=*:*&rows=0" | xml sel -t -m "//result" -o "$node solr count = " -v "@numFound"; done

unm solr count = 209500
ucsb solr count = 209452
orc solr count = 209505

#2 Updated by Dave Vieglais over 11 years ago

list objects counts:

$export NAMESPACE="d=http://ns.dataone.org/service/types/v1"
for node in unm ucsb orc; do curl -s "https://cn-$node-1.dataone.org/cn/v1/object?count=0" | xml sel -N "${NAMESPACE}" -t -m "//d:objectList" -o "$node object list total = " -v "@total"; done

unm object list total = 345897
ucsb object list total = 345922
orc object list total = 345922

#3 Updated by Dave Vieglais over 11 years ago

  • Due date set to 2013-08-03
  • Target version set to 2013.30-Block.4.3
  • Start date set to 2013-07-21

#4 Updated by Robert Waltz over 11 years ago

  • Status changed from New to In Progress
  • Milestone changed from None to CCI-1.2

The situation is not corrected by running Hazelcast Synchronization on Metacat. It appears to be getting worse as more objects are replicated from other servers:

knb 20130802-19:23:09: [WARN]: Local SystemMetadata pid count: 345900 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: processedCount (identifiers from iterator): 345927 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: local pid count not yet shared: 0, shared pid count: 345927 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: Loading missing local keys into hzIdentifiers [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: Initialized identifiers with missing local keys [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: Processing missing SystemMetadata for missing pid count: 27 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]

...replication of new content occurs here...

knb 20130802-19:40:50: [WARN]: The user name from session is: cn=dataone_cn_metacat,dc=dataone,dc=org [edu.ucsb.nceas.metacat.admin.ReplicationAdmin]
knb 20130802-19:40:50: [WARN]: Local SystemMetadata pid count: 345902 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:14: [WARN]: processedCount (identifiers from iterator): 345932 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: local pid count not yet shared: 0, shared pid count: 345932 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: Loading missing local keys into hzIdentifiers [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: Initialized identifiers with missing local keys [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: Processing missing SystemMetadata for missing pid count: 30 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]

#5 Updated by Robert Waltz over 11 years ago

  • Due date changed from 2013-08-03 to 2013-08-10

restarted TC on cn-unm-1. I set the log level to INFO. After restarting, the counts on the CNs were equivalent. I then manually started replication on cn-unm-1. There was a lot of activity in the replication log of metacat. Afterwards, the objectList count and the solr index count across all production CNs are the same.

After a couple of days, we'll re-evaluate. There is a network outage at UNM on Wednesday, August 14, after which we may need to restart tc and daemons again.

#6 Updated by Dave Vieglais over 11 years ago

  • Due date changed from 2013-08-10 to 2013-08-24
  • Target version changed from 2013.30-Block.4.3 to 2013.33-Block.4.4

#7 Updated by Dave Vieglais over 11 years ago

Looks like UNM is slipping behind again - 346092 on UNM versus 346098 on ORC and UCSB.

#8 Updated by Robert Waltz about 11 years ago

  • Due date changed from 2013-08-24 to 2013-11-09
  • Target version changed from 2013.33-Block.4.4 to 2013.44-Block.6.1
  • Product Version set to *

#9 Updated by Dave Vieglais almost 11 years ago

  • Due date changed from 2013-11-09 to 2014-01-18
  • Target version changed from 2013.44-Block.6.1 to 2014.2-Block.1.1

#10 Updated by Robert Waltz almost 11 years ago

  • Target version changed from 2014.2-Block.1.1 to 2014.14-Block.2.3
  • Due date changed from 2014-01-18 to 2014-04-12

#11 Updated by Robert Waltz over 10 years ago

  • Status changed from In Progress to Closed

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)