DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-03-12T16:47:42ZDataONE Tasks
Redmine CN REST - Task #8777 (New): Configure CN to audit objects greater than 1GBhttps://redmine.dataone.org/issues/87772019-03-12T16:47:42ZChris Jonescjones@nceas.ucsb.edu
<p>The replication auditor currently limits auditing of objects at 1GB. There are currently 4 objects greater than 1TB in size, and 3,588 objects greater than 1GB in size, both being very small counts compared to the 2,769,111 objects less than 1GB in size in the network. Nonetheless, they should still be audited if feasible. The limiting factor is likely HTTP timeout limits during the call to <code>MN.getChecksum()</code>. For reference, I'm seeing the following general times for calculating MD5 and SHA-1 checksums:</p>
<pre>Size MD5 SHA-1
---- ------- -------
1GB 00m02.5s 00m02.6s
10GB 00m25.9s 00m30.0s
100GB 03m28.0s 04m01.8s
1TB 50m14.2s 67m38.6s
</pre>
<p>10GB and 100GB objects seem pretty feasible if we set the HTTP client timeout to > 5 minutes, whereas the few > 1TB files may be challenging just due to the timeouts. The other factor is that the <code>AbstractReplicationAuditor</code> sets a default timeout to 60 seconds, and if the task future doesn't return in that time frame, the future gets cancelled. So the HTTP timeout and this timeout need to be increased and coordinated in order to handle larger object auditing.</p>
Infrastructure - Bug #8641 (New): Any change to SystemMetadata causes a new replication task to b...https://redmine.dataone.org/issues/86412018-07-04T12:28:08ZDave Vieglaisdave.vieglais@gmail.com
<p>The hazelcast event listener implemented by ReplicationEventListener basically does:</p>
<pre>ReplicationEventListener.entryUpdated()
if isAuthoritativeReplicaValid()
createReplicationTask()
</pre>
<p><code>isAuthoritativeReplicaValid()</code> checks whether the replication status for the Authoritative MN is <code>complete</code>.</p>
<p>Hence, any update or add event on the systemmetadata map in Hazelcast will trigger addition of a replication task if the authoritative MN has a completed replica, even if replication is not allowed for the object. This causes a significant number of entries to be added to the replication task queue even though those tasks will never do anything as they will be later rejected.</p>
<p>It would be appropriate in <code>entryUpdated()</code> to also check whether replication of the object is allowed. The overhead would be minimal since a copy of the system metadata is already available in <code>entryUpdated()</code>. The same logic should also be added to <code>entryAdded()</code>.</p>
<p><code>ReplicationManager</code> implements <code>boolean isAllowed(SystemMetadata sysmeta)</code> which should do the job.</p>
Infrastructure - Bug #8051 (In Progress): CORS-based CN calls fail using Internet Explorer on Win...https://redmine.dataone.org/issues/80512017-03-22T20:02:54ZChris Jonescjones@nceas.ucsb.edu
<p>As noted in <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: Error -1205 "Client Certificate Rejected" by Safari (Closed)" href="https://redmine.dataone.org/issues/2693">#2693</a>, <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: completely unable to access cn.dataone.org from Safari 7.1 if user has any certificates installed (Closed)" href="https://redmine.dataone.org/issues/6539">#6539</a>, and <a class="issue tracker-1 status-6 priority-4 priority-default closed" title="Bug: Safari 6.0 fails to connect to Metacat MN with SSLVerifyClient (Rejected)" href="https://redmine.dataone.org/issues/3255">#3255</a>, Safari does not handle TLS handshakes correctly when asked for a client X509 certificate. Similarly, IE 11 (and 10) on Windows 7 (and maybe others) does not handle TLS handshakes correctly.</p>
<p>The MetacatUI application used on certain Member Nodes (KNB, ARCTIC, ...) makes calls to the CN Identity API to get account information for users and their associated groups. This is done via an XHR, using the CORS pre-flight mechanism of calling @OPTIONS@ on the REST endpoint. During this call, the CN is returning an @HTTP 403@ Not Authorized response to IE11 on Windows, but succeeds on Firefox and Chrome on Windows. </p>
<p>It seems (but we're not sure) that IE is responding to the request for a client certificate and whatever is sent is being rejected by the CN web server. However, it's not super straight forward. When Apache is configured with:<br>
<br>
SSLVerifyClient optional<br>
SSLVerifyDepth 10</p>
<p>IE11 succeeds on the @OPTIONS@ request. However, when the CN is configured to conditionally set @SSLVerifyClient@ within a @@ block:<br>
<br>
SSLVerifyClient none<br>
<br>
<br>
SSLVerifyClient optional<br>
<br>
<br>
SSLVerifyDepth 10<br>
<br>
the request fails (which is currently the production configuration).</p>
<p>However, after testing in STAGE, IE11 works fine when not asked for a client certificate (@SSLVerifyClient none@). It seems that the interaction with Apache changes based on the conditional logic in a specific @@ block, and IE11 responds incorrectly in some way.</p>
<p>To alleviate issues with browser-based client certificate requests, I suggest that we adopt the following configuration:<br>
<br>
SSLVerifyClient none<br>
<br>
<br>
SSLVerifyClient optional<br>
<br>
<br>
SSLVerifyDepth 10<br>
<br>
This configuration excludes most desktop/handheld browser clients from being asked for an X509 certificate. However, it still allows for Java, Python, curl, R, etc. clients to connect using client-side certificates. Since we've migrated to JWT token-based browser authentication, this seems reasonable to me.</p>
<p>This is currently a blocker in production, so we should consider a manual change to the production CNs before this gets rolled into a CCI release, if agreed upon. Thoughts welcome.</p>
Java Client - Bug #7322 (Testing): D1Object stores data in memory, causes out of memory errorshttps://redmine.dataone.org/issues/73222015-08-27T23:56:06ZChris Jonescjones@nceas.ucsb.edu
<p>When assembling DataPackage instances and populating them, the DataPackage class relies on the underlying D1Object.download() method to store members of the DataPackage locally. The current implementation calls IOUtils.toByteArray(inputstream), which of course stores all bytes in memory. With large data files, this effectively renders DataPackage useless because of OutOfMemory exceptions. The move towards using the javax.activation.DataSource interface helps with this since it provide in memory and on disk implemetations.</p>
<p>Change download() to default to the on-disk DataSource, and make the storage location configurable in d1client.properties.</p>
Infrastructure - Task #6250 (New): CN methods return 500 ServiceFailure when called with bogus te...https://redmine.dataone.org/issues/62502014-08-29T16:58:54ZRoger Dahldahl@unm.edu
<p>A number of the test for the Python CN Client currently fail due to 500 ServiceFailure responses from the server. The CN calls are not meant to complete successfully because they're performed with random values and without certificates, but the tests expect specific exceptions, such as 401 NotAuthorized when calling an API without a certificate.</p>
<p>While 500 ServiceFailure is not exactly a bug, it would be great to get an exception that relates to the actual issue with the request.</p>
<p>For instance, a call to CNIdentity.registerAccount() gives:</p>
<p><?xml version="1.0" encoding="UTF-8"?><br>
<br>
Could not counstruct partial tree: Invalid name: Sidalcea</p>
<p>I'm not sure how many of APIs this applies to, but it's at least several of these:</p>
<p>CNCore.setObsoletedBy()</p>
<p>CNAuthorization.setRightsHolder()<br>
CNAuthorization.isAuthorized()</p>
<p>CNIdentity.registerAccount()<br>
CNIdentity.updateAccount()<br>
CNIdentity.verifyAccount()<br>
CNIdentity.getSubjectInfo()<br>
CNIdentity.listSubjects()<br>
CNIdentity.mapIdentity()<br>
CNIdentity.removeMapIdentity()<br>
CNIdentity.requestMapIdentity()<br>
CNIdentity.confirmMapIdentity()<br>
CNIdentity.denyMapIdentity()<br>
CNIdentity.createGroup()<br>
CNIdentity.addGroupMembers()<br>
CNIdentity.removeGroupMembers()</p>
<p>CNReplication.setReplicationStatus()<br>
CNReplication.updateReplicationMetadata()<br>
CNReplication.setReplicationPolicy()<br>
CNReplication.isNodeAuthorized()</p>
<p>CNRegister.updateNodeCapabilities()<br>
CNRegister.register()</p>
Infrastructure - Task #6168 (New): CNAuthorization.setRightsHolder() returns 500 ServiceFailurehttps://redmine.dataone.org/issues/61682014-08-29T16:01:00ZRoger Dahldahl@unm.edu
<p>CNAuthorization.setRightsHolder() returns 500 ServiceFailure when called with invalid pid.</p>
Infrastructure - Task #6166 (New): CNRead.getChecksum() returns Content-Type of text/csv while it...https://redmine.dataone.org/issues/61662014-08-29T15:03:54ZRoger Dahldahl@unm.eduInfrastructure - Feature #5145 (New): Consider including cert subject(s) in NotAuthorized exceptionshttps://redmine.dataone.org/issues/51452014-04-30T13:21:36ZRoger Dahldahl@unm.edu
<p>When a call fails with a NotAuthorized, including the cert subject(s) in the description makes it easy for the client to determine if they were using the right cert.</p>
Infrastructure - Task #5136 (New): Change DNS seetings on all DataONE VMshttps://redmine.dataone.org/issues/51362014-04-24T21:58:54ZChris Jonescjones@nceas.ucsb.edu
<p>We've transitioned to using the Amazon Route 53 service as the authoritative name servers for the dataone.org domain, and need to configure each development and production VM to point to the appropriate servers (no longer the nceas name servers), and to the Google servers as secondaries:</p>
<p>On UCSB VMs, modify /etc/network/interfaces to point to the following name servers:<br>
<br>
dns-nameservers 128.111.1.2 128.111.1.1 8.8.8.8 8.8.4.4<br>
<br>
(ns2.ucsb.edu, ns1.ucsb.edu, google-public-dns-a.google.com, google-public-dns-b.google.com)</p>
<p>On ORC VMs, modify /etc/network/interfaces to point to the following name servers:<br>
<br>
dns-nameservers 160.36.196.66 160.36.128.66 8.8.8.8 8.8.4.4<br>
<br>
(ns2.utk.edu, ns1.utk.edu, google-public-dns-a.google.com, google-public-dns-b.google.com)</p>
<p>On UNM VMs, modify /etc/network/interfaces to point to the following name servers:<br>
<br>
dns-nameservers 64.106.44.200 64.106.44.210 8.8.8.8 8.8.4.4<br>
<br>
(ns2.unm.edu, ns1.unm.edu, google-public-dns-a.google.com, google-public-dns-b.google.com)</p>
<p>Per Nick Brand, putting the per-campus name servers first should give us the lowest latency.</p>
Infrastructure - Task #3978 (In Progress): Add a CN reporting script that summarizes spatial data...https://redmine.dataone.org/issues/39782013-09-13T16:12:08ZChris Jonescjones@nceas.ucsb.edu
<p>Spatial data in the CN Solr search index includes per-object bounding box data. For client side mapping purposes, these data are too numerous to add to a vector map. Create a spatial summarization script that reduces the total points to summarized counts at a given cell resolution. Allow for the resolution to be configurable. Export the result as a JSON object, compatible with mapping libraries like heatmapjs and D3js.</p>
DataONE API - Bug #3658 (New): Deleting objects breaks obsoletes chain traversalhttps://redmine.dataone.org/issues/36582013-03-13T23:02:19ZRob Nahfrnahf@epscor.unm.edu
<p>A deleted object can be at the tail, head, or in the middle of an obsoletes chain. Once removed, assuming the sysmeta is also removed, the obsoletes chain is not fully traversable unless the obsoletes and obsoletedBy fields of its direct neighbors in the obsoletes chain are repointed. Additionally, if the deletion was from the head of an obsoletes chain, the obsoletes chain cannot be added to, because the latest in the chain has it's obsoletedBy field already populated.</p>
Infrastructure - Task #3616 (New): Enable ServiceMethodRestriction support in Metacathttps://redmine.dataone.org/issues/36162013-02-27T16:28:29ZChris Jonescjones@nceas.ucsb.edu
<p>Metacat currently assembles a Node document from a number of properties within metacat.properties. The document is sent to the DataONE CN upon registration. There's also the concept of a submitter list in Metacat to restrict the user DNs that can submit content via the Metacat API. Port this over to the DataONE API calls by incorporating that list into the ServiceMethodRestrictions in the Node document for MNStorage methods of 'create', 'update', 'delete', and 'archive'. Reconcile the differences between LDAP DNs and CILogon DNs.</p>
Infrastructure - Task #3488 (New): Create custom merge policy to support managing data/informatio...https://redmine.dataone.org/issues/34882013-01-16T18:57:47ZSkye Roseboomsroseboo@dataone.unm.edu
<p>The current merge policy in our Hazelcast clusters will add all entries to the cluster that creates a superset of entries found across members. This may cause conflicts, especially when the same object has different content (like a system metadata record with differing replica entries and statuses). Create a new merge policy class that compares content for entries in hzSystemMetadata, and re-sync entries across nodes, prioritizing entries where the dateSysMetadataModified date is the newest and serialVersion is the highest.</p>
<p>Possible extension of existing merge policy which currently ensure the number of records in each backing datastore in the cluster is consistent. Extend to ensure the information contained in each datastore is consistent.</p>
Infrastructure - Task #3419 (In Progress): CNRead.describe() does not return Content-Length headerhttps://redmine.dataone.org/issues/34192012-12-12T16:33:41ZRoger Dahldahl@unm.edu
<p>dahl@vm-dataone:~/d1/d1_python/d1_client_onedrive/src$ curl -v -X HEAD <a href="https://cn-stage.test.dataone.org/cn/v1/object/SEF001_024MTBD004R00_20060719.50.5">https://cn-stage.test.dataone.org/cn/v1/object/SEF001_024MTBD004R00_20060719.50.5</a><br>
...<br>
< HTTP/1.1 200 OK<br>
< Date: Wed, 12 Dec 2012 16:19:10 GMT<br>
< Server: Apache/2.2.14 (Ubuntu)<br>
< DataONE-Checksum: MD5,501f259043b9cbfdfaa4ed3944f93698<br>
< Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT<br>
< DataONE-ObjectFormat: eml://ecoinformatics.org/eml-2.0.1<br>
< DataONE-SerialVersion: 1<br>
< Vary: Accept-Encoding<br>
< Content-Type: text/xml;charset=UTF-8<br>
* no chunk, no close, no size. Assume close to signal end</p>
Infrastructure - Task #3328 (New): Enable a data preview from the search results pagehttps://redmine.dataone.org/issues/33282012-10-10T15:54:40ZChris Jonescjones@nceas.ucsb.edu
<p>Visualization and mapping tools for the data sets:<br>
Before the users download the data, several expressed that they wanted to “view” the data <br>
-Browsing graphics<br>
-Mapping graphics</p>
<p>this capability will need some planning and discussion, since the browsable view is entirely dependent on the format of the data file. We may be able to support some common types at first (CSV, PNG, JPEG), and later enable others (mapping, charting, etc.).</p>