Project

General

Profile

Task #3659

Production CNs aren't replicating the updated object format list

Added by Chris Jones about 11 years ago. Updated about 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
Support Operations
Target version:
Start date:
2013-03-13
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

When I updated the object format list on cn-ucsb-1.dataone.org, we expect it to be replicated to the other two CNs via Metacat replication. However, /var/log/metacat/replicate.log shows an SSL handshake error. Determine why this is failing.

History

#1 Updated by Chris Jones about 11 years ago

To test the SSL connection, I used:

sudo curl -v -o - -capath /etc/ssl/certs \
--cert /etc/dataone/client/certs/cn-unm-1.dataone.org.pem \
--key /etc/dataone/client/private/cn-unm-1.dataone.org.key \
"https://cn-ucsb-1.dataone.org/knb/servlet/replication?server=cn-unm-1.dataone.org/knb/servlet/replication&action=test"

which gives a successful SSL response of 200:

< HTTP/1.1 200 OK
< Date: Wed, 13 Mar 2013 23:14:36 GMT
< Server: Apache/2.2.14 (Ubuntu)
< Content-Length: 45
< Vary: Accept-Encoding
< Content-Type: text/html
<
Test successfully

This also works using:

openssl s_client \
-connect cn-ucsb-1.dataone.org:443 \
-showcerts -CApath /etc/ssl/certs \
-cert /etc/dataone/client/certs/cn-unm-1.dataone.org.pem \
-key /etc/dataone/client/private/cn-unm-1.dataone.org.key

I'm now looking at the java cacerts angle at this point.

#2 Updated by Chris Jones about 11 years ago

  • Status changed from New to In Progress

#3 Updated by Chris Jones about 11 years ago

ALthough I haven't directly connected via Java SSL, the cacerts file contains both the D1 Root and Production CA certificates, and we are configured to use the correct cacerts file. The cert and key properties in metacat.properties are set correctly, pointing to the FQDN-based certs in /etc/dataone/client/{certs|private} on all three CNs. I'll connect via Java SSL directly now, but am also seeing the following error in th replication.log file:

knb 2013-03-02T04:10:32: [ERROR]: ReplicationService.handleGetLockRequest - error requesting file lock from MetacatReplication.handleGetLockRequest: the requested docid 'OBJECT_FORMAT_LIST' does not exist

I'm not sure why the docid isn't 'OBJECT_FORMAT_LIST.1', and will also look into this next.

#4 Updated by Chris Jones about 11 years ago

  • Status changed from In Progress to Closed
  • translation missing: en.field_remaining_hours set to 0.0

After looking at this with Ben, we realized that the underlying problem was a Metacat replication issue, not an SSL issue. Updating the format list on a CN other thn it's home server caused it to not replicate. We changed the home server for the document, and forced the replication, which worked fine. Closed.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)