DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-11-22T18:29:39ZDataONE Tasks
Redmine Infrastructure - Story #8855 (New): Put the system metadata part ahead of the object part when d1...https://redmine.dataone.org/issues/88552019-11-22T18:29:39ZJing Taotao@nceas.ucsb.edu
<p>When a client calls the mn(cn).create/update methods, it usually constructs a multipart which contains the sys part (containing the system metadata information), object part (containing the object itself) and other parts. There is no requirement about the order of those parts.<br>
Metacat will use a new streaming multipart handler which will calculate the checksum when it stores the object part into a file. This requires we should know the checksum algorithm before the serialization of the object. So Metacat has to digest the system metadata first in order to improve the performance.<br>
In order to take the advantage, we recommend clients should put the system metadata part ahead of the object part when it is constructing the multipart to be sent to the server.<br>
Note: event though the client doesn't use the recommended order, the process still works but the performance will be poor.</p>
Infrastructure - Story #8307 (New): Check node subject on node registration and subsequent callshttps://redmine.dataone.org/issues/83072018-02-06T20:04:39ZDave Vieglaisdave.vieglais@gmail.com
<p>The <code>/node/subject</code> entry of the node document should match the subject of the certificate used to register the node (unless the call is being made by a CN certificate).</p>
Infrastructure - Story #8234 (New): Use University of Kansas ORCID membership to support authenti...https://redmine.dataone.org/issues/82342018-01-09T02:00:28ZDave Vieglaisdave.vieglais@gmail.com
<p><a href="https://orcid.org/members/001G000001CAkZgIAL-university-of-kansas" class="external">KU is a premium ORCID member</a> as a member of the Greater Western Library Alliance (GWLA). As a result, KU has access to five ORCID API keys. One is currently in use for the KU DSpace instance.</p>
<p>Goal of this story is to leverage on of the remaining API keys to support ORCID authentication in the DataONE production environment.</p>
Infrastructure - Story #8109 (New): Does authentication token need to include group information?https://redmine.dataone.org/issues/81092017-06-06T18:34:32ZJing Taotao@nceas.ucsb.edu
<p>Currently when portal generates the authentication token, it doesn't include the group information in it in order to make the token short. So when an entity tries to authorize the token, it has to look up the group information in order to make the group authorization mechanism work. The lookup process has to access multiple places, e.g., the dataone cn, an ldap server, and et al. This seems an overhead. </p>
Infrastructure - Story #8061 (New): develop queue-based processing system for the CNhttps://redmine.dataone.org/issues/80612017-04-05T22:40:24ZRob Nahfrnahf@epscor.unm.edu
<p>The event-based mechanism for generating indexing tasks is not robust to network segregation and inefficient because it triggers indexing tasks when system metadata are loaded into Hazelcast map - not "real" events, just a data hydration from persistent storage.</p>
<p>Investigate using reliable queues instead. The design will want to be abstracted so that different implementations can be swapped in at a later date, so use standard messaging patterns.</p>
<p>RabbitMQ, ActiveMQ are potential implementations to use.<br>
ZeroMQ is a lower-level implementation, probably a bit more complicated, but very performant.</p>
Infrastructure - Story #8049 (In Progress): Support synchronization of system metadata for unhost...https://redmine.dataone.org/issues/80492017-03-21T05:34:38ZRob Nahfrnahf@epscor.unm.edu
<p>As part of mutable-content MN support, allow the MemberNode to keep the system metadata records for all resultant versions of its changeable entities. this allows them to keep accurate system metadata for every version even though they do not have the object bytes anymore for that version.</p>
<p>Benefits:<br>
1. MN does not orphan any objects<br>
2. MN can administer objects from past versions on their own MN. Adjust the access policy of all versions, for example.<br>
3. don't need to call cn.setObsoletedBy or leave that field empty.</p>
<p>Costs:<br>
1. requires new logic for indexing (possibly)<br>
2. requires new logic for registerSystemMEtadata (possibly)<br>
3. require new logic for synchronization </p>
<p>very similar to how we synchronize DATA objects, but don't trigger MN replication.</p>
Infrastructure - Story #7920 (In Progress): migrate apache2 authorization rules from 2.2 conformi...https://redmine.dataone.org/issues/79202016-10-26T18:15:04ZRob Nahfrnahf@epscor.unm.edu
<p>Currently, our apache configs are using the 2.2 style, but we will have to upgrade at some point. </p>
<p>The access_compat module (under mods-enabled) is in place to allow us to use the old 2.2 conventions.</p>
Infrastructure - Story #7807 (New): cn.synchronize should support synchronization failure correct...https://redmine.dataone.org/issues/78072016-05-13T16:56:25ZRob Nahfrnahf@epscor.unm.edu
<p>cn.synchronize(session, identifier) works well for its original purpose (supporting MN-driven system metadata updates, and MN-driven push synchronization), but doesn't seem to work for manual synchronization failure workflows. The main problem is that the request can only be made by the MN itself (using the MN client certificate). </p>
<p>As we envision a centralized dashboard for monitoring failed synchronization items, how do we address this situation? </p>
<p>The synchronization processing queue needs both the pid and a nodeId from where to retrieve the object. the NodeId is not specified directly in the method call, but gleaned from the session by a reverse lookup from the certificate. (It uses the first node found in the NodeList where the Node.subject field matches the certificate subject).</p>
<p>Should we allow node.contactSubjects into the algorithm?<br>
Should we add nodeId as a parameter?</p>
Infrastructure - Story #7713 (New): d1DebConfig.xml should be versioned in dataone-cn-os-corehttps://redmine.dataone.org/issues/77132016-04-11T15:04:45ZRobert Waltz
<p>A version attribute should be added to the type debConfType in dataoneDebuPkgConfigTypes.xsd.</p>
<p>This will allow the debConfig root element to be versioned such that when dataone-cn-os-core is updated/installed, there is a better way to differentiate between the file located on releases website: </p>
<p><a href="https://releases.dataone.org/debian/conf/d1DebConfig.xml">https://releases.dataone.org/debian/conf/d1DebConfig.xml</a></p>
<p>and the one stored on the filesystem:</p>
<p>/etc/dataone/d1DebConfig.xml</p>
<p>The intention of having an up-to-date version on releases.dataone.org is to be able to reconfigure the dataone without having to re-install dataone-cn-os-core</p>
<p>The xml schema file will also need to be updated and versioned</p>
Infrastructure - Story #7650 (New): DAO for SystemMetadata changes the SystemMetadata.replication...https://redmine.dataone.org/issues/76502016-02-17T20:02:24ZRob Nahfrnahf@epscor.unm.edu
<p>SystemMetadataDaoMetacatImpl (in d1_cn_common) sets replication_allowed to false in situations of a null ReplicationPolicy, even though the CN is not supposed to alter the ReplicationPolicy.</p>
<p>DataONE has historically taken the approach of opting into replicating content, instead of opting-out, so at least the default of false is in keeping with that. However, the DAO layer seems to be the wrong place to be applying business rules. It would be better to put in d1_synchronization logic. (and updateSystemMetadata).</p>
<p>Two issues are at play here:<br>
1. should the semantics of a null replicationPolicy be "use the CN default behavior at the time of submission" or "I don't care" - allowing the CN to add or remove replicas at will."?</p>
<p>If the former, we need to persist a default ReplicationPolicy. If the latter, we need to remove the addition of a ReplicationPolicy, and potentially remove default polices based on MN versions of system metadata.</p>
<ol>
<li>should the DAO implemention apply the business rules about default values, or should it be refactored to d1_synchronization and updateSystemMEtadata?</li>
</ol>
Infrastructure - Story #7559 (New): Develop plan for securing application passwords in the CN stackhttps://redmine.dataone.org/issues/75592015-12-15T22:58:06ZBen Leinfelderleinfelder@nceas.ucsb.edu
<p>There are many components that use passwords in configuration files. While we do restrict who can access our servers and what they can view when on the server, it's still not entirely secure to have property files with cleartext passwords.</p>
<p>Here are components that are known to be configured with cleartext passwords<br>
* d1_identity_manager (LDAP)<br>
* d1_noderegistry (LDAP)<br>
* d1_replication (postgres)<br>
* d1_portal_servlet (postgres)<br>
* Metacat (postgres)<br>
* all hazelcast connections</p>
Infrastructure - Story #7358 (In Progress): ContactSubject on NodeList must be valid D1 ldap entryhttps://redmine.dataone.org/issues/73582015-09-16T19:58:13ZRobert Waltz
<p>Before a CN can be started, LDAP must have an approved entry for Contact Subject.</p>
<p>Contact Subject has been defaulted to CN=Robert P Waltz A904,O=Google,C=US,DC=cilogon,DC=org on all of the CN entries in the node list.</p>
<p>Since Robert P Waltz is a developer and not an organizer or director, then the publicized contact on the CNs should be changed to reflect the organizational hierarchy.</p>
<p>The Contact Subject for the CNs should be the PI of the project, or at least, a Co-PI.</p>
<p>Also, The DN of this subject should be derived from the DataONE CA instead of cilogon.</p>
<p>Updating the existing systems should be trivial. The Ldap Entry for each CN node will be modified, and a new LDAP entry for the new Subject will need to be added.</p>
Infrastructure - Story #7224 (New): push synchronization request status indicator: synchronizeSta...https://redmine.dataone.org/issues/72242015-06-18T08:30:42ZRob Nahfrnahf@epscor.unm.edu
<p>Push synchronization (cn.synchronize, mn.updateSystemMetadata) involves an end-user that might want to have an idea of how long until the queued action is going to take to complete. Something as simple as returning the place in line of the sync request might suffice as the indicator, or a more complete data packet, including the place in line and the queue velocity, could be attempted.</p>
<p>The real-world analogy for this kind of indictor is taking a number at the deli-counter: You don't know when you will be served, but you know how many people are in front of you. </p>
<p>This option is a separate call to the CN to check the status of the sync request, so that the current place in line is returned. The advantage of this is that if the velocity of synchronization changes, the interested party can call again and get an updated value - it has more diagnostic and monitoring power. This could lead to over-use, however.</p>
Infrastructure - Story #4650 (New): Allow MN to bias resolve to the authoritative MNhttps://redmine.dataone.org/issues/46502014-03-28T18:17:40ZBruce Wilsonbwilso27@utk.edu
<p>In the MN workshop (IDCC, Feb 2014) and in a MN forum discussion, several MN's asked if the resolve could be biased so that the authoritative MN was the somehow biased, so that users would be more likely to retrieve the data from the authoritative MN. A suggestion that met with a very positive response was to ensure that CN.resolve() returns the authoritative MN as the first item in the list. My understanding is that the order in the CN.resolve() results is indeterminate. </p>
Infrastructure - Story #2548 (New): recasting untrusted certs to public poses accessibility incon...https://redmine.dataone.org/issues/25482012-03-27T21:55:59ZRob Nahfrnahf@epscor.unm.edu
<p>KNB recasts a connection with an untrusted certificate to public, so that a client does not get "less than public" privileges.<br>
GMN throws an InvalidToken in this situation.<br>
both refuse connections from clients with expired certificates from trusted CAs.</p>
<p>This approach can cause confusion caused when the user unwittingly uses an untrusted certficate and doesn't get what they expected. If these connections were instead refused, the user would be alerted and could reconnect as a public user, if it chose.</p>
<p>brief discussion found at line 97 of : <a href="http://epad.dataone.org/20120131-authn-authz-questions">http://epad.dataone.org/20120131-authn-authz-questions</a></p>
<ul>
<li>when would honest users be in this situation?</li>
<li>elicit advantages of recasting approach</li>
<li>either way, dataone should implement uniform behavior across CN and MNs.</li>
</ul>