Bug #3874
Inconsistent object counts across CNs
100%
Description
As of this writing, CN-UNM-1 reports 345897 objects, whereas CN-ORC-1 and CN-UCSB-1 report 345922 objects.
Goal of this ticket is to identify how the object count came to be different, and what corrective measures should be taken.
May be related to #3352 ?
Related issues
History
#1 Updated by Dave Vieglais over 11 years ago
SOLR counts are off as well, but different on each server:
$ for node in unm ucsb orc; do curl -s "https://cn-$node-1.dataone.org/cn/v1/query/solr/?q=*:*&rows=0" | xml sel -t -m "//result" -o "$node solr count = " -v "@numFound"; done
unm solr count = 209500
ucsb solr count = 209452
orc solr count = 209505
#2 Updated by Dave Vieglais over 11 years ago
list objects counts:
$export NAMESPACE="d=http://ns.dataone.org/service/types/v1"
for node in unm ucsb orc; do curl -s "https://cn-$node-1.dataone.org/cn/v1/object?count=0" | xml sel -N "${NAMESPACE}" -t -m "//d:objectList" -o "$node object list total = " -v "@total"; done
unm object list total = 345897
ucsb object list total = 345922
orc object list total = 345922
#3 Updated by Dave Vieglais over 11 years ago
- Due date set to 2013-08-03
- Target version set to 2013.30-Block.4.3
- Start date set to 2013-07-21
#4 Updated by Robert Waltz over 11 years ago
- Status changed from New to In Progress
- Milestone changed from None to CCI-1.2
The situation is not corrected by running Hazelcast Synchronization on Metacat. It appears to be getting worse as more objects are replicated from other servers:
knb 20130802-19:23:09: [WARN]: Local SystemMetadata pid count: 345900 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: processedCount (identifiers from iterator): 345927 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: local pid count not yet shared: 0, shared pid count: 345927 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: Loading missing local keys into hzIdentifiers [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: Initialized identifiers with missing local keys [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:23:33: [WARN]: Processing missing SystemMetadata for missing pid count: 27 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
...replication of new content occurs here...
knb 20130802-19:40:50: [WARN]: The user name from session is: cn=dataone_cn_metacat,dc=dataone,dc=org [edu.ucsb.nceas.metacat.admin.ReplicationAdmin]
knb 20130802-19:40:50: [WARN]: Local SystemMetadata pid count: 345902 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:14: [WARN]: processedCount (identifiers from iterator): 345932 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: local pid count not yet shared: 0, shared pid count: 345932 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: Loading missing local keys into hzIdentifiers [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: Initialized identifiers with missing local keys [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
knb 20130802-19:41:15: [WARN]: Processing missing SystemMetadata for missing pid count: 30 [edu.ucsb.nceas.metacat.dataone.hazelcast.HazelcastService]
#5 Updated by Robert Waltz over 11 years ago
- Due date changed from 2013-08-03 to 2013-08-10
restarted TC on cn-unm-1. I set the log level to INFO. After restarting, the counts on the CNs were equivalent. I then manually started replication on cn-unm-1. There was a lot of activity in the replication log of metacat. Afterwards, the objectList count and the solr index count across all production CNs are the same.
After a couple of days, we'll re-evaluate. There is a network outage at UNM on Wednesday, August 14, after which we may need to restart tc and daemons again.
#6 Updated by Dave Vieglais over 11 years ago
- Due date changed from 2013-08-10 to 2013-08-24
- Target version changed from 2013.30-Block.4.3 to 2013.33-Block.4.4
#7 Updated by Dave Vieglais over 11 years ago
Looks like UNM is slipping behind again - 346092 on UNM versus 346098 on ORC and UCSB.
#8 Updated by Robert Waltz about 11 years ago
- Due date changed from 2013-08-24 to 2013-11-09
- Target version changed from 2013.33-Block.4.4 to 2013.44-Block.6.1
- Product Version set to *
#9 Updated by Dave Vieglais almost 11 years ago
- Due date changed from 2013-11-09 to 2014-01-18
- Target version changed from 2013.44-Block.6.1 to 2014.2-Block.1.1
#10 Updated by Robert Waltz almost 11 years ago
- Target version changed from 2014.2-Block.1.1 to 2014.14-Block.2.3
- Due date changed from 2014-01-18 to 2014-04-12
#11 Updated by Robert Waltz over 10 years ago
- Status changed from In Progress to Closed