DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-11-22T18:29:39ZDataONE Tasks
Redmine Infrastructure - Story #8855 (New): Put the system metadata part ahead of the object part when d1...https://redmine.dataone.org/issues/88552019-11-22T18:29:39ZJing Taotao@nceas.ucsb.edu
<p>When a client calls the mn(cn).create/update methods, it usually constructs a multipart which contains the sys part (containing the system metadata information), object part (containing the object itself) and other parts. There is no requirement about the order of those parts.<br>
Metacat will use a new streaming multipart handler which will calculate the checksum when it stores the object part into a file. This requires we should know the checksum algorithm before the serialization of the object. So Metacat has to digest the system metadata first in order to improve the performance.<br>
In order to take the advantage, we recommend clients should put the system metadata part ahead of the object part when it is constructing the multipart to be sent to the server.<br>
Note: event though the client doesn't use the recommended order, the process still works but the performance will be poor.</p>
Infrastructure - Story #8823 (New): Recent Apache and OpenSSL combinations break connectivity on ...https://redmine.dataone.org/issues/88232019-06-19T02:03:44ZDave Vieglaisdave.vieglais@gmail.com
<p>The latest Ubuntu 18.04 release of Apache is 2.4.29 and OpenSSL is 1.1.1.</p>
<p>This combination creates a significant delay in TLS renegotiation that results from the Apache config option on the CNs:</p>
<pre>SSLVerifyClient none
<Location "/cn">
<If " ! ( %{HTTP_USER_AGENT} =~ /(windows|chrome|mozilla|safari|webkit)/i )">
SSLVerifyClient optional
</If>
</Location>
</pre>
<p>Which is intended to disable client certificate authentication for web browsers, but allow it for others. This approach worked fine on older Apache / OpenSSL but the new combination creates a several second wait when the server discovers the client is not a web browser and tells it to reconnect with the option of including a client certificate.</p>
<p>The latest released version of Apache is 2.4.39 and this is available through a PPA intended for Debian developers. This has been installed so far on dev-2, sandbox, stage, and stage-2 with the process:</p>
<pre>sudo add-apt-repository ppa:ondrej/apache2
sudo apt update
sudo apt dist-upgrade
sudo systemctl restart apache2
</pre>
<p>This installs Apache 2.4.39 and OpenSSL 1.1.1c which appears to resolve the apparent bug in the 2.4.29 / 1.1.1 combination.</p>
<p>One issue with the update is that by default, Apache now offers TLSv1.3, which is great except that it appears to cause problems with at least Python clients failing to connect and getting a 403 error. For example:</p>
<pre>$ python3
>>> import requests
>>> r = requests.get("https://cn-sandbox-ucsb-1.test.dataone.org/cn/v2/monitor/ping")
>>> r.status_code
403
</pre>
<p>That TLSv1.3 is the problem was verified with cn-stage-unm-2 by configuring Apache with:</p>
<pre> SSLProtocol all -TLSv1.3 -SSLv2 -SSLv3
</pre>
<p>to disable TLSv1.3. After this change the Python client was able to connect as expected.</p>
<p>A workaround has not yet been researched.</p>
<p>It is not clear if this issue applies to other clients such as R and Java, so until we learn one way or the other, TLSv1.3 will be disabled on the CNs.</p>
<p>--This issue will likely apply to Member Nodes as well once TLSv1.3 is generally available or if MNs choose to install Apache 2.4.39.-- CORRECTION: this issue only applies when attempting to renegotiate TLS after headers have been transferred, so will not typically apply to a MN.</p>
Infrastructure - Story #8525 (In Progress): timeout exceptions thrown from Hazelcast disable sync...https://redmine.dataone.org/issues/85252018-03-27T22:36:54ZRob Nahfrnahf@epscor.unm.edu
<p>Very occasionally, synchronization disables itself when RuntimeExceptions bubble up. The most common of these is when the Hazelcast client seemingly disconnects, or can't complete an operation, and a java.util.concurrent.TimeoutException is thrown.</p>
<p>These are usually due to network problems, as evidenced by timeout exceptions appearing in both the Metacat hazelcast-storage.log files as well as d1-processing logs.</p>
<p>Temporary problems like this should be recoverable, and so a retry or bypass for those timeouts should be implemented. It's not clear whether or not a new HazelcastClient should be instantiated, or whether the same client is still usable. (Is the client tightly bound to a session, or does it recover?) If a new client is needed, preliminary searching through the code indicates that refactoring the HazelcastClientFactory.getProcessingClient() method is only used in a few places, and the singleton behavior it uses can be sidestepped by removing the method and replacing it with a getLock() wrapper method (that seems to be the dominant use case for it). See the newer SyncQueueFacade in d1_synchronization for guidance on that. If the client is never exposed, it can be refreshed as needed.</p>
<pre>root@cn-unm-1:/var/metacat/logs# grep FATAL hazelcast-storage.log.1
[FATAL] 2018-03-27 03:15:19,380 (BaseManager$2:run:1402) [64.106.40.6]:5701 [DataONE] Caught error while calling event listener; cause: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
</pre><pre>[ERROR] 2018-03-27 03:15:19,781 [ProcessDaemonTask1] (SyncObjectTaskManager:run:84) java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent
.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.dataone.cn.batch.synchronization.SyncObjectTaskManager.run(SyncObjectTaskManager.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.QueueClientProxy.innerPoll(QueueClientProxy.java:115)
at com.hazelcast.client.QueueClientProxy.poll(QueueClientProxy.java:111)
at org.dataone.cn.batch.synchronization.type.SyncQueueFacade.poll(SyncQueueFacade.java:231)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:131)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:73)
</pre> Infrastructure - Story #8234 (New): Use University of Kansas ORCID membership to support authenti...https://redmine.dataone.org/issues/82342018-01-09T02:00:28ZDave Vieglaisdave.vieglais@gmail.com
<p><a href="https://orcid.org/members/001G000001CAkZgIAL-university-of-kansas" class="external">KU is a premium ORCID member</a> as a member of the Greater Western Library Alliance (GWLA). As a result, KU has access to five ORCID API keys. One is currently in use for the KU DSpace instance.</p>
<p>Goal of this story is to leverage on of the remaining API keys to support ORCID authentication in the DataONE production environment.</p>
Infrastructure - Story #8173 (New): add checks for retrograde systemMetadata changeshttps://redmine.dataone.org/issues/81732017-09-01T19:42:33ZRob Nahfrnahf@epscor.unm.edu
<p>with the ability to prioritize and the introduction of parallelized index task processing, the effective queue is not guaranteed to be time-ordered. If there are two valid system metadata changes resulting in two tasks and the second change hits the index first, the earlier task should be rejected, as its changes are out of date.</p>
Infrastructure - Story #8109 (New): Does authentication token need to include group information?https://redmine.dataone.org/issues/81092017-06-06T18:34:32ZJing Taotao@nceas.ucsb.edu
<p>Currently when portal generates the authentication token, it doesn't include the group information in it in order to make the token short. So when an entity tries to authorize the token, it has to look up the group information in order to make the group authorization mechanism work. The lookup process has to access multiple places, e.g., the dataone cn, an ldap server, and et al. This seems an overhead. </p>
Infrastructure - Story #8061 (New): develop queue-based processing system for the CNhttps://redmine.dataone.org/issues/80612017-04-05T22:40:24ZRob Nahfrnahf@epscor.unm.edu
<p>The event-based mechanism for generating indexing tasks is not robust to network segregation and inefficient because it triggers indexing tasks when system metadata are loaded into Hazelcast map - not "real" events, just a data hydration from persistent storage.</p>
<p>Investigate using reliable queues instead. The design will want to be abstracted so that different implementations can be swapped in at a later date, so use standard messaging patterns.</p>
<p>RabbitMQ, ActiveMQ are potential implementations to use.<br>
ZeroMQ is a lower-level implementation, probably a bit more complicated, but very performant.</p>
Infrastructure - Story #8049 (In Progress): Support synchronization of system metadata for unhost...https://redmine.dataone.org/issues/80492017-03-21T05:34:38ZRob Nahfrnahf@epscor.unm.edu
<p>As part of mutable-content MN support, allow the MemberNode to keep the system metadata records for all resultant versions of its changeable entities. this allows them to keep accurate system metadata for every version even though they do not have the object bytes anymore for that version.</p>
<p>Benefits:<br>
1. MN does not orphan any objects<br>
2. MN can administer objects from past versions on their own MN. Adjust the access policy of all versions, for example.<br>
3. don't need to call cn.setObsoletedBy or leave that field empty.</p>
<p>Costs:<br>
1. requires new logic for indexing (possibly)<br>
2. requires new logic for registerSystemMEtadata (possibly)<br>
3. require new logic for synchronization </p>
<p>very similar to how we synchronize DATA objects, but don't trigger MN replication.</p>
Infrastructure - Story #7920 (In Progress): migrate apache2 authorization rules from 2.2 conformi...https://redmine.dataone.org/issues/79202016-10-26T18:15:04ZRob Nahfrnahf@epscor.unm.edu
<p>Currently, our apache configs are using the 2.2 style, but we will have to upgrade at some point. </p>
<p>The access_compat module (under mods-enabled) is in place to allow us to use the old 2.2 conventions.</p>
Infrastructure - Story #7713 (New): d1DebConfig.xml should be versioned in dataone-cn-os-corehttps://redmine.dataone.org/issues/77132016-04-11T15:04:45ZRobert Waltz
<p>A version attribute should be added to the type debConfType in dataoneDebuPkgConfigTypes.xsd.</p>
<p>This will allow the debConfig root element to be versioned such that when dataone-cn-os-core is updated/installed, there is a better way to differentiate between the file located on releases website: </p>
<p><a href="https://releases.dataone.org/debian/conf/d1DebConfig.xml">https://releases.dataone.org/debian/conf/d1DebConfig.xml</a></p>
<p>and the one stored on the filesystem:</p>
<p>/etc/dataone/d1DebConfig.xml</p>
<p>The intention of having an up-to-date version on releases.dataone.org is to be able to reconfigure the dataone without having to re-install dataone-cn-os-core</p>
<p>The xml schema file will also need to be updated and versioned</p>
Infrastructure - Story #7559 (New): Develop plan for securing application passwords in the CN stackhttps://redmine.dataone.org/issues/75592015-12-15T22:58:06ZBen Leinfelderleinfelder@nceas.ucsb.edu
<p>There are many components that use passwords in configuration files. While we do restrict who can access our servers and what they can view when on the server, it's still not entirely secure to have property files with cleartext passwords.</p>
<p>Here are components that are known to be configured with cleartext passwords<br>
* d1_identity_manager (LDAP)<br>
* d1_noderegistry (LDAP)<br>
* d1_replication (postgres)<br>
* d1_portal_servlet (postgres)<br>
* Metacat (postgres)<br>
* all hazelcast connections</p>
Infrastructure - Story #7358 (In Progress): ContactSubject on NodeList must be valid D1 ldap entryhttps://redmine.dataone.org/issues/73582015-09-16T19:58:13ZRobert Waltz
<p>Before a CN can be started, LDAP must have an approved entry for Contact Subject.</p>
<p>Contact Subject has been defaulted to CN=Robert P Waltz A904,O=Google,C=US,DC=cilogon,DC=org on all of the CN entries in the node list.</p>
<p>Since Robert P Waltz is a developer and not an organizer or director, then the publicized contact on the CNs should be changed to reflect the organizational hierarchy.</p>
<p>The Contact Subject for the CNs should be the PI of the project, or at least, a Co-PI.</p>
<p>Also, The DN of this subject should be derived from the DataONE CA instead of cilogon.</p>
<p>Updating the existing systems should be trivial. The Ldap Entry for each CN node will be modified, and a new LDAP entry for the new Subject will need to be added.</p>
Infrastructure - Story #7224 (New): push synchronization request status indicator: synchronizeSta...https://redmine.dataone.org/issues/72242015-06-18T08:30:42ZRob Nahfrnahf@epscor.unm.edu
<p>Push synchronization (cn.synchronize, mn.updateSystemMetadata) involves an end-user that might want to have an idea of how long until the queued action is going to take to complete. Something as simple as returning the place in line of the sync request might suffice as the indicator, or a more complete data packet, including the place in line and the queue velocity, could be attempted.</p>
<p>The real-world analogy for this kind of indictor is taking a number at the deli-counter: You don't know when you will be served, but you know how many people are in front of you. </p>
<p>This option is a separate call to the CN to check the status of the sync request, so that the current place in line is returned. The advantage of this is that if the velocity of synchronization changes, the interested party can call again and get an updated value - it has more diagnostic and monitoring power. This could lead to over-use, however.</p>
Infrastructure - Story #4650 (New): Allow MN to bias resolve to the authoritative MNhttps://redmine.dataone.org/issues/46502014-03-28T18:17:40ZBruce Wilsonbwilso27@utk.edu
<p>In the MN workshop (IDCC, Feb 2014) and in a MN forum discussion, several MN's asked if the resolve could be biased so that the authoritative MN was the somehow biased, so that users would be more likely to retrieve the data from the authoritative MN. A suggestion that met with a very positive response was to ensure that CN.resolve() returns the authoritative MN as the first item in the list. My understanding is that the order in the CN.resolve() results is indeterminate. </p>
Infrastructure - Story #2548 (New): recasting untrusted certs to public poses accessibility incon...https://redmine.dataone.org/issues/25482012-03-27T21:55:59ZRob Nahfrnahf@epscor.unm.edu
<p>KNB recasts a connection with an untrusted certificate to public, so that a client does not get "less than public" privileges.<br>
GMN throws an InvalidToken in this situation.<br>
both refuse connections from clients with expired certificates from trusted CAs.</p>
<p>This approach can cause confusion caused when the user unwittingly uses an untrusted certficate and doesn't get what they expected. If these connections were instead refused, the user would be alerted and could reconnect as a public user, if it chose.</p>
<p>brief discussion found at line 97 of : <a href="http://epad.dataone.org/20120131-authn-authz-questions">http://epad.dataone.org/20120131-authn-authz-questions</a></p>
<ul>
<li>when would honest users be in this situation?</li>
<li>elicit advantages of recasting approach</li>
<li>either way, dataone should implement uniform behavior across CN and MNs.</li>
</ul>