DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-03-12T16:54:08ZDataONE Tasks
Redmine CN REST - Task #8778 (New): Ensure SystemMetadata replica auditing updates are saved and broadcasthttps://redmine.dataone.org/issues/87782019-03-12T16:54:08ZChris Jonescjones@nceas.ucsb.edu
<p>In <code>MNAuditTask.call()</code>, we process a batch of pids in need of auditing per Member Node. For each <code>pid</code> in the <code>auditPIDs</code> list, we call <code>MN.getChecksum()</code>. Regardless of success or failure, we set the <code>Replica.replicaVerified</code> date in the <code>SystemMetadata</code>. However, the task has a copy of the system metadata from <code>hzSystemMetadata.get()</code>, and the task doesn't subsequently call <code>hzSystemMetadata.put(pid, sysmeta)</code>. This means that while we are auditing content, we may just not be recording the results! I need to look at the code more to see if we make an API call to <code>CN.updateSystemMetadata()</code> elsewhere, but I would expect the<code>MNAuditTask</code> to do this. Also, if this happens in the task, we also need to broadcast the system metadata change to the authoritative MN and all replica MNS. Lastly, we need to update the <code>serialVersion</code> field to show the other CNs what the most recent replica list is.</p>
CN REST - Task #8777 (New): Configure CN to audit objects greater than 1GBhttps://redmine.dataone.org/issues/87772019-03-12T16:47:42ZChris Jonescjones@nceas.ucsb.edu
<p>The replication auditor currently limits auditing of objects at 1GB. There are currently 4 objects greater than 1TB in size, and 3,588 objects greater than 1GB in size, both being very small counts compared to the 2,769,111 objects less than 1GB in size in the network. Nonetheless, they should still be audited if feasible. The limiting factor is likely HTTP timeout limits during the call to <code>MN.getChecksum()</code>. For reference, I'm seeing the following general times for calculating MD5 and SHA-1 checksums:</p>
<pre>Size MD5 SHA-1
---- ------- -------
1GB 00m02.5s 00m02.6s
10GB 00m25.9s 00m30.0s
100GB 03m28.0s 04m01.8s
1TB 50m14.2s 67m38.6s
</pre>
<p>10GB and 100GB objects seem pretty feasible if we set the HTTP client timeout to > 5 minutes, whereas the few > 1TB files may be challenging just due to the timeouts. The other factor is that the <code>AbstractReplicationAuditor</code> sets a default timeout to 60 seconds, and if the task future doesn't return in that time frame, the future gets cancelled. So the HTTP timeout and this timeout need to be increased and coordinated in order to handle larger object auditing.</p>
CN REST - Story #8756 (New): Ensure replica auditor is effectivehttps://redmine.dataone.org/issues/87562019-01-12T20:25:18ZChris Jonescjones@nceas.ucsb.edu
<p>The replication auditor service is currently configured to audit all objects every 90 days. As documented in <a class="issue tracker-4 status-1 priority-4 priority-default child" title="Story: Replica Auditing service is throwing errors (New)" href="https://redmine.dataone.org/issues/8582">#8582</a>, the auditor is not working correctly. While the errors being thrown that are described in that ticket seem to be limited to <code>pid</code>s with certain characters in them, I think the whole auditor process is not keeping up with our content.</p>
<p>Looking at the number of objects on each member node that haven't been audited in the last 90 days, auditing is well behind (if we consider it working at all):</p>
<pre>SELECT sm.authoritive_member_node, count(smr.guid) AS count
FROM systemmetadata sm INNER JOIN smreplicationstatus smr
ON sm.guid = smr.guid
WHERE
smr.member_node != 'urn:node:CN' AND
sm.date_uploaded < (SELECT CURRENT_DATE - interval '90 days') AND
smr.date_verified < (SELECT CURRENT_DATE - interval '90 days')
GROUP BY sm.authoritive_member_node
ORDER BY count DESC;
authoritive_member_node | count
-------------------------+--------
urn:node:ARCTIC | 771872
urn:node:PANGAEA | 507456
urn:node:LTER | 416339
urn:node:DRYAD | 374439
urn:node:CDL | 242115
urn:node:PISCO | 235791
urn:node:KNB | 86075
urn:node:TDAR | 75639
urn:node:NCEI | 50974
urn:node:USGS_SDC | 40290
urn:node:TERN | 31671
urn:node:ESS_DIVE | 28830
urn:node:NMEPSCOR | 16042
urn:node:GOA | 9266
urn:node:IARC | 7677
urn:node:NRDC | 6673
urn:node:TFRI | 6478
urn:node:PPBIO | 3464
urn:node:ORNLDAAC | 3328
urn:node:FEMC | 2430
urn:node:EDI | 2098
urn:node:GRIIDC | 2065
urn:node:mnTestKNB | 2010
urn:node:SANPARKS | 2008
urn:node:ONEShare | 1874
urn:node:R2R | 1787
urn:node:USGSCSAS | 1151
urn:node:EDACGSTORE | 1075
urn:node:US_MPC | 1032
urn:node:RW | 970
urn:node:KUBI | 516
urn:node:NEON | 487
urn:node:LTER_EUROPE | 343
urn:node:IOE | 279
urn:node:RGD | 273
urn:node:ESA | 272
urn:node:NKN | 218
urn:node:OTS_NDC | 126
urn:node:BCODMO | 115
urn:node:SEAD | 90
urn:node:mnTestNKN | 50
urn:node:EDORA | 28
urn:node:ONEShare.pem | 22
urn:node:CLOEBIRD | 17
urn:node:mnTestBCODMO | 11
urn:node:USANPN | 10
urn:node:mnTestTDAR | 10
urn:node:MyMemberNode | 1
</pre>
<p>The table above represents the number of un-audited objects (in the last 90 days), but I get the feeling that the auditor isn't able to audit any of the content it is charged to audit given 1) the frequency, 2) the number of threads allotted, and 3) the configured batch count (seems way too low). <del>Note that this query excludes replicated content - this is just the original objects</del> (After looking at my query again, I think the join is including all replicas - the total is 2,935,787, which is greater than the total objects in the system (2,751,136), so this query needs to be refined).</p>
<p>We need to evaluate the true effectiveness of the auditor. Some strategies may include: 1) looking to see if we may be in an infinite loop on processing a few <code>pid</code>s due to the issues in <a class="issue tracker-4 status-1 priority-4 priority-default child" title="Story: Replica Auditing service is throwing errors (New)" href="https://redmine.dataone.org/issues/8582">#8582</a>, 2) seeing if we can increase the batch size by increasing the total threads allocated in the executor, and 3) decide if we need to offload the process from the CNs and distribute the workload across a cluster of workers that can do the auditing faster. Needs some thought and discussion.</p>
Member Nodes - Task #8697 (New): ESSDIVE: anonymous download issuehttps://redmine.dataone.org/issues/86972018-09-13T19:40:48ZAmy Forresteraforres4@utk.edu
<p>We do have one small issue that we will want to discuss with you and that will be figuring out if we can deal with anonymous download of our data. We had promised our users that they would be notified about downloads and who downloaded. We had not thought about that replication into DataONE of the data itself would violate that promise. I don't think it changes our join schedule and enthusiasm for joining, it just means we have an issue to work out soon.</p>
Infrastructure - Bug #8641 (New): Any change to SystemMetadata causes a new replication task to b...https://redmine.dataone.org/issues/86412018-07-04T12:28:08ZDave Vieglaisdave.vieglais@gmail.com
<p>The hazelcast event listener implemented by ReplicationEventListener basically does:</p>
<pre>ReplicationEventListener.entryUpdated()
if isAuthoritativeReplicaValid()
createReplicationTask()
</pre>
<p><code>isAuthoritativeReplicaValid()</code> checks whether the replication status for the Authoritative MN is <code>complete</code>.</p>
<p>Hence, any update or add event on the systemmetadata map in Hazelcast will trigger addition of a replication task if the authoritative MN has a completed replica, even if replication is not allowed for the object. This causes a significant number of entries to be added to the replication task queue even though those tasks will never do anything as they will be later rejected.</p>
<p>It would be appropriate in <code>entryUpdated()</code> to also check whether replication of the object is allowed. The overhead would be minimal since a copy of the system metadata is already available in <code>entryUpdated()</code>. The same logic should also be added to <code>entryAdded()</code>.</p>
<p><code>ReplicationManager</code> implements <code>boolean isAllowed(SystemMetadata sysmeta)</code> which should do the job.</p>
Infrastructure - Task #6250 (New): CN methods return 500 ServiceFailure when called with bogus te...https://redmine.dataone.org/issues/62502014-08-29T16:58:54ZRoger Dahldahl@unm.edu
<p>A number of the test for the Python CN Client currently fail due to 500 ServiceFailure responses from the server. The CN calls are not meant to complete successfully because they're performed with random values and without certificates, but the tests expect specific exceptions, such as 401 NotAuthorized when calling an API without a certificate.</p>
<p>While 500 ServiceFailure is not exactly a bug, it would be great to get an exception that relates to the actual issue with the request.</p>
<p>For instance, a call to CNIdentity.registerAccount() gives:</p>
<p><?xml version="1.0" encoding="UTF-8"?><br>
<br>
Could not counstruct partial tree: Invalid name: Sidalcea</p>
<p>I'm not sure how many of APIs this applies to, but it's at least several of these:</p>
<p>CNCore.setObsoletedBy()</p>
<p>CNAuthorization.setRightsHolder()<br>
CNAuthorization.isAuthorized()</p>
<p>CNIdentity.registerAccount()<br>
CNIdentity.updateAccount()<br>
CNIdentity.verifyAccount()<br>
CNIdentity.getSubjectInfo()<br>
CNIdentity.listSubjects()<br>
CNIdentity.mapIdentity()<br>
CNIdentity.removeMapIdentity()<br>
CNIdentity.requestMapIdentity()<br>
CNIdentity.confirmMapIdentity()<br>
CNIdentity.denyMapIdentity()<br>
CNIdentity.createGroup()<br>
CNIdentity.addGroupMembers()<br>
CNIdentity.removeGroupMembers()</p>
<p>CNReplication.setReplicationStatus()<br>
CNReplication.updateReplicationMetadata()<br>
CNReplication.setReplicationPolicy()<br>
CNReplication.isNodeAuthorized()</p>
<p>CNRegister.updateNodeCapabilities()<br>
CNRegister.register()</p>
Infrastructure - Task #6168 (New): CNAuthorization.setRightsHolder() returns 500 ServiceFailurehttps://redmine.dataone.org/issues/61682014-08-29T16:01:00ZRoger Dahldahl@unm.edu
<p>CNAuthorization.setRightsHolder() returns 500 ServiceFailure when called with invalid pid.</p>
Infrastructure - Task #6166 (New): CNRead.getChecksum() returns Content-Type of text/csv while it...https://redmine.dataone.org/issues/61662014-08-29T15:03:54ZRoger Dahldahl@unm.eduInfrastructure - Feature #5145 (New): Consider including cert subject(s) in NotAuthorized exceptionshttps://redmine.dataone.org/issues/51452014-04-30T13:21:36ZRoger Dahldahl@unm.edu
<p>When a call fails with a NotAuthorized, including the cert subject(s) in the description makes it easy for the client to determine if they were using the right cert.</p>
Infrastructure - Task #5137 (New): Fix DataONE CA chain file location in cn-buildouthttps://redmine.dataone.org/issues/51372014-04-25T15:06:34ZChris Jonescjones@nceas.ucsb.edu
<p>We had been using /var/local/dataone as the location for trusted CA certificates on the CN, but in 2012 we changed to using a single chain file rather than a directory. I made this change in the cn-ssl config, but (inadvertantly?) used /etc/ssl/certs for tha DataONECAChain.crt file location.</p>
<p>When this file <em>isn't</em> hashed during c_rehash, there is no duplicate hashes created for the DataONERootCA certificate, but when it is, using /etc/ssl/certs in curl operations fail. </p>
<p>The easiest fix is to move the DataONECAChain.crt file back to /var/local/dataone in cn-ssl, so there isn't the potential for a conflict.</p>
Infrastructure - Task #5136 (New): Change DNS seetings on all DataONE VMshttps://redmine.dataone.org/issues/51362014-04-24T21:58:54ZChris Jonescjones@nceas.ucsb.edu
<p>We've transitioned to using the Amazon Route 53 service as the authoritative name servers for the dataone.org domain, and need to configure each development and production VM to point to the appropriate servers (no longer the nceas name servers), and to the Google servers as secondaries:</p>
<p>On UCSB VMs, modify /etc/network/interfaces to point to the following name servers:<br>
<br>
dns-nameservers 128.111.1.2 128.111.1.1 8.8.8.8 8.8.4.4<br>
<br>
(ns2.ucsb.edu, ns1.ucsb.edu, google-public-dns-a.google.com, google-public-dns-b.google.com)</p>
<p>On ORC VMs, modify /etc/network/interfaces to point to the following name servers:<br>
<br>
dns-nameservers 160.36.196.66 160.36.128.66 8.8.8.8 8.8.4.4<br>
<br>
(ns2.utk.edu, ns1.utk.edu, google-public-dns-a.google.com, google-public-dns-b.google.com)</p>
<p>On UNM VMs, modify /etc/network/interfaces to point to the following name servers:<br>
<br>
dns-nameservers 64.106.44.200 64.106.44.210 8.8.8.8 8.8.4.4<br>
<br>
(ns2.unm.edu, ns1.unm.edu, google-public-dns-a.google.com, google-public-dns-b.google.com)</p>
<p>Per Nick Brand, putting the per-campus name servers first should give us the lowest latency.</p>
DataONE API - Bug #3658 (New): Deleting objects breaks obsoletes chain traversalhttps://redmine.dataone.org/issues/36582013-03-13T23:02:19ZRob Nahfrnahf@epscor.unm.edu
<p>A deleted object can be at the tail, head, or in the middle of an obsoletes chain. Once removed, assuming the sysmeta is also removed, the obsoletes chain is not fully traversable unless the obsoletes and obsoletedBy fields of its direct neighbors in the obsoletes chain are repointed. Additionally, if the deletion was from the head of an obsoletes chain, the obsoletes chain cannot be added to, because the latest in the chain has it's obsoletedBy field already populated.</p>
Infrastructure - Task #3616 (New): Enable ServiceMethodRestriction support in Metacathttps://redmine.dataone.org/issues/36162013-02-27T16:28:29ZChris Jonescjones@nceas.ucsb.edu
<p>Metacat currently assembles a Node document from a number of properties within metacat.properties. The document is sent to the DataONE CN upon registration. There's also the concept of a submitter list in Metacat to restrict the user DNs that can submit content via the Metacat API. Port this over to the DataONE API calls by incorporating that list into the ServiceMethodRestrictions in the Node document for MNStorage methods of 'create', 'update', 'delete', and 'archive'. Reconcile the differences between LDAP DNs and CILogon DNs.</p>
Infrastructure - Task #3488 (New): Create custom merge policy to support managing data/informatio...https://redmine.dataone.org/issues/34882013-01-16T18:57:47ZSkye Roseboomsroseboo@dataone.unm.edu
<p>The current merge policy in our Hazelcast clusters will add all entries to the cluster that creates a superset of entries found across members. This may cause conflicts, especially when the same object has different content (like a system metadata record with differing replica entries and statuses). Create a new merge policy class that compares content for entries in hzSystemMetadata, and re-sync entries across nodes, prioritizing entries where the dateSysMetadataModified date is the newest and serialVersion is the highest.</p>
<p>Possible extension of existing merge policy which currently ensure the number of records in each backing datastore in the cluster is consistent. Extend to ensure the information contained in each datastore is consistent.</p>
Infrastructure - Task #3328 (New): Enable a data preview from the search results pagehttps://redmine.dataone.org/issues/33282012-10-10T15:54:40ZChris Jonescjones@nceas.ucsb.edu
<p>Visualization and mapping tools for the data sets:<br>
Before the users download the data, several expressed that they wanted to “view” the data <br>
-Browsing graphics<br>
-Mapping graphics</p>
<p>this capability will need some planning and discussion, since the browsable view is entirely dependent on the format of the data file. We may be able to support some common types at first (CSV, PNG, JPEG), and later enable others (mapping, charting, etc.).</p>