Task #3659
Production CNs aren't replicating the updated object format list
100%
Description
When I updated the object format list on cn-ucsb-1.dataone.org, we expect it to be replicated to the other two CNs via Metacat replication. However, /var/log/metacat/replicate.log shows an SSL handshake error. Determine why this is failing.
History
#1 Updated by Chris Jones over 11 years ago
To test the SSL connection, I used:
sudo curl -v -o - -capath /etc/ssl/certs \
--cert /etc/dataone/client/certs/cn-unm-1.dataone.org.pem \
--key /etc/dataone/client/private/cn-unm-1.dataone.org.key \
"https://cn-ucsb-1.dataone.org/knb/servlet/replication?server=cn-unm-1.dataone.org/knb/servlet/replication&action=test"
which gives a successful SSL response of 200:
< HTTP/1.1 200 OK
< Date: Wed, 13 Mar 2013 23:14:36 GMT
< Server: Apache/2.2.14 (Ubuntu)
< Content-Length: 45
< Vary: Accept-Encoding
< Content-Type: text/html
<
Test successfully
This also works using:
openssl s_client \
-connect cn-ucsb-1.dataone.org:443 \
-showcerts -CApath /etc/ssl/certs \
-cert /etc/dataone/client/certs/cn-unm-1.dataone.org.pem \
-key /etc/dataone/client/private/cn-unm-1.dataone.org.key
I'm now looking at the java cacerts angle at this point.
#2 Updated by Chris Jones over 11 years ago
- Status changed from New to In Progress
#3 Updated by Chris Jones over 11 years ago
ALthough I haven't directly connected via Java SSL, the cacerts file contains both the D1 Root and Production CA certificates, and we are configured to use the correct cacerts file. The cert and key properties in metacat.properties are set correctly, pointing to the FQDN-based certs in /etc/dataone/client/{certs|private} on all three CNs. I'll connect via Java SSL directly now, but am also seeing the following error in th replication.log file:
knb 2013-03-02T04:10:32: [ERROR]: ReplicationService.handleGetLockRequest - error requesting file lock from MetacatReplication.handleGetLockRequest: the requested docid 'OBJECT_FORMAT_LIST' does not exist
I'm not sure why the docid isn't 'OBJECT_FORMAT_LIST.1', and will also look into this next.
#4 Updated by Chris Jones over 11 years ago
- Status changed from In Progress to Closed
- translation missing: en.field_remaining_hours set to 0.0
After looking at this with Ben, we realized that the underlying problem was a Metacat replication issue, not an SSL issue. Updating the format list on a CN other thn it's home server caused it to not replicate. We changed the home server for the document, and forced the replication, which worked fine. Closed.