DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-11-22T18:29:39ZDataONE Tasks
Redmine Infrastructure - Story #8855 (New): Put the system metadata part ahead of the object part when d1...https://redmine.dataone.org/issues/88552019-11-22T18:29:39ZJing Taotao@nceas.ucsb.edu
<p>When a client calls the mn(cn).create/update methods, it usually constructs a multipart which contains the sys part (containing the system metadata information), object part (containing the object itself) and other parts. There is no requirement about the order of those parts.<br>
Metacat will use a new streaming multipart handler which will calculate the checksum when it stores the object part into a file. This requires we should know the checksum algorithm before the serialization of the object. So Metacat has to digest the system metadata first in order to improve the performance.<br>
In order to take the advantage, we recommend clients should put the system metadata part ahead of the object part when it is constructing the multipart to be sent to the server.<br>
Note: event though the client doesn't use the recommended order, the process still works but the performance will be poor.</p>
CN REST - Story #8771 (New): Issue with LDAP when updating `nodeReplicationPolicy`https://redmine.dataone.org/issues/87712019-03-05T19:42:17ZRoger Dahldahl@unm.edu
<p>When a submitting a Node doc update which includes a nodeReplicationPolicy, this section is good:</p>
<pre><nodeReplicationPolicy>
<maxObjectSize>21474836480</maxObjectSize>
<spaceAllocated>1099511627776</spaceAllocated>
</nodeReplicationPolicy>
</pre>
<p>while the same section without <code>maxObjectSize</code> returns error:</p>
<pre> <error detailCode="4822" errorCode="500" name="ServiceFailure">
<description>updateNodeCapabilities failed due to LDAP communication failure:: InvalidAttributeValueException:[LDAP: error code 21 - d1ReplicationPolicyMaxObjectSize: value #0 invalid per syntax]:[LDAP: error code 21 - d1ReplicationPolicyMaxObjectSize: value #0 invalid per syntax]</description>
</error>
</pre>
<p>The schema allows leaving <code>maxObjectSize</code> out, which means that the MN accepts replicas of unlimited size.</p>
<p>Both GMN and Metacat leave <code>maxObjectSize</code> out if the setting is configured to unlimited with <code>-1</code>.</p>
<p>I think it used to work.</p>
CN REST - Story #8770 (New): Issue with CN handling of encoded identifiers in object/ meta/ node/...https://redmine.dataone.org/issues/87702019-03-05T19:37:13ZRoger Dahldahl@unm.edu
<p>Works:<br>
<a href="http://cn.dataone.org/cn/v2/object/doi:10.6073/AA/knb-lter-bes.298.37">http://cn.dataone.org/cn/v2/object/doi:10.6073/AA/knb-lter-bes.298.37</a><br>
<a href="https://cn.dataone.org/cn/v2/node/urn:node:LTER">https://cn.dataone.org/cn/v2/node/urn:node:LTER</a></p>
<p>Does not work:<br>
<a href="http://cn.dataone.org/cn/v2/object/doi%3A10.6073%2FAA%2Fknb-lter-bes.298.37">http://cn.dataone.org/cn/v2/object/doi%3A10.6073%2FAA%2Fknb-lter-bes.298.37</a><br>
<a href="https://cn.dataone.org/cn/v2/node/urn%3Anode%3ALTER">https://cn.dataone.org/cn/v2/node/urn%3Anode%3ALTER</a></p>
<p>Note: Behavior differs between HTTP / HTTPS.</p>
Infrastructure - Story #8307 (New): Check node subject on node registration and subsequent callshttps://redmine.dataone.org/issues/83072018-02-06T20:04:39ZDave Vieglaisdave.vieglais@gmail.com
<p>The <code>/node/subject</code> entry of the node document should match the subject of the certificate used to register the node (unless the call is being made by a CN certificate).</p>
Infrastructure - Story #8234 (New): Use University of Kansas ORCID membership to support authenti...https://redmine.dataone.org/issues/82342018-01-09T02:00:28ZDave Vieglaisdave.vieglais@gmail.com
<p><a href="https://orcid.org/members/001G000001CAkZgIAL-university-of-kansas" class="external">KU is a premium ORCID member</a> as a member of the Greater Western Library Alliance (GWLA). As a result, KU has access to five ORCID API keys. One is currently in use for the KU DSpace instance.</p>
<p>Goal of this story is to leverage on of the remaining API keys to support ORCID authentication in the DataONE production environment.</p>
Infrastructure - Story #8109 (New): Does authentication token need to include group information?https://redmine.dataone.org/issues/81092017-06-06T18:34:32ZJing Taotao@nceas.ucsb.edu
<p>Currently when portal generates the authentication token, it doesn't include the group information in it in order to make the token short. So when an entity tries to authorize the token, it has to look up the group information in order to make the group authorization mechanism work. The lookup process has to access multiple places, e.g., the dataone cn, an ldap server, and et al. This seems an overhead. </p>
Infrastructure - Story #8061 (New): develop queue-based processing system for the CNhttps://redmine.dataone.org/issues/80612017-04-05T22:40:24ZRob Nahfrnahf@epscor.unm.edu
<p>The event-based mechanism for generating indexing tasks is not robust to network segregation and inefficient because it triggers indexing tasks when system metadata are loaded into Hazelcast map - not "real" events, just a data hydration from persistent storage.</p>
<p>Investigate using reliable queues instead. The design will want to be abstracted so that different implementations can be swapped in at a later date, so use standard messaging patterns.</p>
<p>RabbitMQ, ActiveMQ are potential implementations to use.<br>
ZeroMQ is a lower-level implementation, probably a bit more complicated, but very performant.</p>
Infrastructure - Story #7940 (New): Retrieval of system metadata is too slowhttps://redmine.dataone.org/issues/79402016-11-25T20:48:28ZDave Vieglaisdave.vieglais@gmail.com
<p>Retrieving a system metadata document takes 1-2 seconds in the production environment. Response time is improved on subsequent calls, but still takes longer than a second to complete. Since system metadata is critical for many operations, its retrieval should not be an impediment to users. At this rate, a simple single threaded client may download information about 30 or so objects per minute, or about 1800 per hour. Since some data packages have content in the order hundreds to thousands of entries, this means that it would take an hour or so to simply iterate over the system metadata for a moderate data package. </p>
<p>The retrieval process should be profiled to identify which portions are inefficient, then those portions addressed where possible.</p>
Infrastructure - Story #7807 (New): cn.synchronize should support synchronization failure correct...https://redmine.dataone.org/issues/78072016-05-13T16:56:25ZRob Nahfrnahf@epscor.unm.edu
<p>cn.synchronize(session, identifier) works well for its original purpose (supporting MN-driven system metadata updates, and MN-driven push synchronization), but doesn't seem to work for manual synchronization failure workflows. The main problem is that the request can only be made by the MN itself (using the MN client certificate). </p>
<p>As we envision a centralized dashboard for monitoring failed synchronization items, how do we address this situation? </p>
<p>The synchronization processing queue needs both the pid and a nodeId from where to retrieve the object. the NodeId is not specified directly in the method call, but gleaned from the session by a reverse lookup from the certificate. (It uses the first node found in the NodeList where the Node.subject field matches the certificate subject).</p>
<p>Should we allow node.contactSubjects into the algorithm?<br>
Should we add nodeId as a parameter?</p>
Infrastructure - Story #7713 (New): d1DebConfig.xml should be versioned in dataone-cn-os-corehttps://redmine.dataone.org/issues/77132016-04-11T15:04:45ZRobert Waltz
<p>A version attribute should be added to the type debConfType in dataoneDebuPkgConfigTypes.xsd.</p>
<p>This will allow the debConfig root element to be versioned such that when dataone-cn-os-core is updated/installed, there is a better way to differentiate between the file located on releases website: </p>
<p><a href="https://releases.dataone.org/debian/conf/d1DebConfig.xml">https://releases.dataone.org/debian/conf/d1DebConfig.xml</a></p>
<p>and the one stored on the filesystem:</p>
<p>/etc/dataone/d1DebConfig.xml</p>
<p>The intention of having an up-to-date version on releases.dataone.org is to be able to reconfigure the dataone without having to re-install dataone-cn-os-core</p>
<p>The xml schema file will also need to be updated and versioned</p>
Infrastructure - Story #7650 (New): DAO for SystemMetadata changes the SystemMetadata.replication...https://redmine.dataone.org/issues/76502016-02-17T20:02:24ZRob Nahfrnahf@epscor.unm.edu
<p>SystemMetadataDaoMetacatImpl (in d1_cn_common) sets replication_allowed to false in situations of a null ReplicationPolicy, even though the CN is not supposed to alter the ReplicationPolicy.</p>
<p>DataONE has historically taken the approach of opting into replicating content, instead of opting-out, so at least the default of false is in keeping with that. However, the DAO layer seems to be the wrong place to be applying business rules. It would be better to put in d1_synchronization logic. (and updateSystemMetadata).</p>
<p>Two issues are at play here:<br>
1. should the semantics of a null replicationPolicy be "use the CN default behavior at the time of submission" or "I don't care" - allowing the CN to add or remove replicas at will."?</p>
<p>If the former, we need to persist a default ReplicationPolicy. If the latter, we need to remove the addition of a ReplicationPolicy, and potentially remove default polices based on MN versions of system metadata.</p>
<ol>
<li>should the DAO implemention apply the business rules about default values, or should it be refactored to d1_synchronization and updateSystemMEtadata?</li>
</ol>
Infrastructure - Story #7559 (New): Develop plan for securing application passwords in the CN stackhttps://redmine.dataone.org/issues/75592015-12-15T22:58:06ZBen Leinfelderleinfelder@nceas.ucsb.edu
<p>There are many components that use passwords in configuration files. While we do restrict who can access our servers and what they can view when on the server, it's still not entirely secure to have property files with cleartext passwords.</p>
<p>Here are components that are known to be configured with cleartext passwords<br>
* d1_identity_manager (LDAP)<br>
* d1_noderegistry (LDAP)<br>
* d1_replication (postgres)<br>
* d1_portal_servlet (postgres)<br>
* Metacat (postgres)<br>
* all hazelcast connections</p>
Infrastructure - Story #7224 (New): push synchronization request status indicator: synchronizeSta...https://redmine.dataone.org/issues/72242015-06-18T08:30:42ZRob Nahfrnahf@epscor.unm.edu
<p>Push synchronization (cn.synchronize, mn.updateSystemMetadata) involves an end-user that might want to have an idea of how long until the queued action is going to take to complete. Something as simple as returning the place in line of the sync request might suffice as the indicator, or a more complete data packet, including the place in line and the queue velocity, could be attempted.</p>
<p>The real-world analogy for this kind of indictor is taking a number at the deli-counter: You don't know when you will be served, but you know how many people are in front of you. </p>
<p>This option is a separate call to the CN to check the status of the sync request, so that the current place in line is returned. The advantage of this is that if the velocity of synchronization changes, the interested party can call again and get an updated value - it has more diagnostic and monitoring power. This could lead to over-use, however.</p>
Infrastructure - Story #4650 (New): Allow MN to bias resolve to the authoritative MNhttps://redmine.dataone.org/issues/46502014-03-28T18:17:40ZBruce Wilsonbwilso27@utk.edu
<p>In the MN workshop (IDCC, Feb 2014) and in a MN forum discussion, several MN's asked if the resolve could be biased so that the authoritative MN was the somehow biased, so that users would be more likely to retrieve the data from the authoritative MN. A suggestion that met with a very positive response was to ensure that CN.resolve() returns the authoritative MN as the first item in the list. My understanding is that the order in the CN.resolve() results is indeterminate. </p>
Infrastructure - Story #2548 (New): recasting untrusted certs to public poses accessibility incon...https://redmine.dataone.org/issues/25482012-03-27T21:55:59ZRob Nahfrnahf@epscor.unm.edu
<p>KNB recasts a connection with an untrusted certificate to public, so that a client does not get "less than public" privileges.<br>
GMN throws an InvalidToken in this situation.<br>
both refuse connections from clients with expired certificates from trusted CAs.</p>
<p>This approach can cause confusion caused when the user unwittingly uses an untrusted certficate and doesn't get what they expected. If these connections were instead refused, the user would be alerted and could reconnect as a public user, if it chose.</p>
<p>brief discussion found at line 97 of : <a href="http://epad.dataone.org/20120131-authn-authz-questions">http://epad.dataone.org/20120131-authn-authz-questions</a></p>
<ul>
<li>when would honest users be in this situation?</li>
<li>elicit advantages of recasting approach</li>
<li>either way, dataone should implement uniform behavior across CN and MNs.</li>
</ul>