DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-06-19T02:03:44ZDataONE Tasks
Redmine Infrastructure - Story #8823 (New): Recent Apache and OpenSSL combinations break connectivity on ...https://redmine.dataone.org/issues/88232019-06-19T02:03:44ZDave Vieglaisdave.vieglais@gmail.com
<p>The latest Ubuntu 18.04 release of Apache is 2.4.29 and OpenSSL is 1.1.1.</p>
<p>This combination creates a significant delay in TLS renegotiation that results from the Apache config option on the CNs:</p>
<pre>SSLVerifyClient none
<Location "/cn">
<If " ! ( %{HTTP_USER_AGENT} =~ /(windows|chrome|mozilla|safari|webkit)/i )">
SSLVerifyClient optional
</If>
</Location>
</pre>
<p>Which is intended to disable client certificate authentication for web browsers, but allow it for others. This approach worked fine on older Apache / OpenSSL but the new combination creates a several second wait when the server discovers the client is not a web browser and tells it to reconnect with the option of including a client certificate.</p>
<p>The latest released version of Apache is 2.4.39 and this is available through a PPA intended for Debian developers. This has been installed so far on dev-2, sandbox, stage, and stage-2 with the process:</p>
<pre>sudo add-apt-repository ppa:ondrej/apache2
sudo apt update
sudo apt dist-upgrade
sudo systemctl restart apache2
</pre>
<p>This installs Apache 2.4.39 and OpenSSL 1.1.1c which appears to resolve the apparent bug in the 2.4.29 / 1.1.1 combination.</p>
<p>One issue with the update is that by default, Apache now offers TLSv1.3, which is great except that it appears to cause problems with at least Python clients failing to connect and getting a 403 error. For example:</p>
<pre>$ python3
>>> import requests
>>> r = requests.get("https://cn-sandbox-ucsb-1.test.dataone.org/cn/v2/monitor/ping")
>>> r.status_code
403
</pre>
<p>That TLSv1.3 is the problem was verified with cn-stage-unm-2 by configuring Apache with:</p>
<pre> SSLProtocol all -TLSv1.3 -SSLv2 -SSLv3
</pre>
<p>to disable TLSv1.3. After this change the Python client was able to connect as expected.</p>
<p>A workaround has not yet been researched.</p>
<p>It is not clear if this issue applies to other clients such as R and Java, so until we learn one way or the other, TLSv1.3 will be disabled on the CNs.</p>
<p>--This issue will likely apply to Member Nodes as well once TLSv1.3 is generally available or if MNs choose to install Apache 2.4.39.-- CORRECTION: this issue only applies when attempting to renegotiate TLS after headers have been transferred, so will not typically apply to a MN.</p>
Infrastructure - Story #8525 (In Progress): timeout exceptions thrown from Hazelcast disable sync...https://redmine.dataone.org/issues/85252018-03-27T22:36:54ZRob Nahfrnahf@epscor.unm.edu
<p>Very occasionally, synchronization disables itself when RuntimeExceptions bubble up. The most common of these is when the Hazelcast client seemingly disconnects, or can't complete an operation, and a java.util.concurrent.TimeoutException is thrown.</p>
<p>These are usually due to network problems, as evidenced by timeout exceptions appearing in both the Metacat hazelcast-storage.log files as well as d1-processing logs.</p>
<p>Temporary problems like this should be recoverable, and so a retry or bypass for those timeouts should be implemented. It's not clear whether or not a new HazelcastClient should be instantiated, or whether the same client is still usable. (Is the client tightly bound to a session, or does it recover?) If a new client is needed, preliminary searching through the code indicates that refactoring the HazelcastClientFactory.getProcessingClient() method is only used in a few places, and the singleton behavior it uses can be sidestepped by removing the method and replacing it with a getLock() wrapper method (that seems to be the dominant use case for it). See the newer SyncQueueFacade in d1_synchronization for guidance on that. If the client is never exposed, it can be refreshed as needed.</p>
<pre>root@cn-unm-1:/var/metacat/logs# grep FATAL hazelcast-storage.log.1
[FATAL] 2018-03-27 03:15:19,380 (BaseManager$2:run:1402) [64.106.40.6]:5701 [DataONE] Caught error while calling event listener; cause: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
</pre><pre>[ERROR] 2018-03-27 03:15:19,781 [ProcessDaemonTask1] (SyncObjectTaskManager:run:84) java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent
.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.dataone.cn.batch.synchronization.SyncObjectTaskManager.run(SyncObjectTaskManager.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.QueueClientProxy.innerPoll(QueueClientProxy.java:115)
at com.hazelcast.client.QueueClientProxy.poll(QueueClientProxy.java:111)
at org.dataone.cn.batch.synchronization.type.SyncQueueFacade.poll(SyncQueueFacade.java:231)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:131)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:73)
</pre> Infrastructure - Story #8173 (New): add checks for retrograde systemMetadata changeshttps://redmine.dataone.org/issues/81732017-09-01T19:42:33ZRob Nahfrnahf@epscor.unm.edu
<p>with the ability to prioritize and the introduction of parallelized index task processing, the effective queue is not guaranteed to be time-ordered. If there are two valid system metadata changes resulting in two tasks and the second change hits the index first, the earlier task should be rejected, as its changes are out of date.</p>
Infrastructure - Story #4278 (New): EML indexing - handle multiple temporalCoverage and spatialCo...https://redmine.dataone.org/issues/42782014-02-14T22:16:59ZBen Leinfelderleinfelder@nceas.ucsb.edu
<p>Our current index schema only supports single values for coverage elements. This is makes things simple (no confusing multiple non-contiguous coverage as continuous) but it discards a lot of potential information that would be useful for discovery.</p>
<p>We should be effectively indexing these coverage elements and all their values.</p>
<p>temporalCoverage (including multiple single date times):<br>
beginDate<br>
endDate</p>
<p>spatialCoverage:<br>
eastBoundCoord<br>
westBoundCoord<br>
northBoundCoord<br>
southBoundCoord</p>
Infrastructure - Story #4188 (In Progress): dataone Exception definition and implementation requi...https://redmine.dataone.org/issues/41882013-11-27T21:34:20ZDave Vieglaisdave.vieglais@gmail.com
<p>The documentation describes the properties and serialization of DataONE exceptions:</p>
<p><a href="http://mule1.dataone.org/ArchitectureDocs-current/apis/Exceptions.html">http://mule1.dataone.org/ArchitectureDocs-current/apis/Exceptions.html</a></p>
<p>However, the definition in the schema:</p>
<p><a href="https://repository.dataone.org/software/cicore/tags/D1_SCHEMA_1_1_1/dataoneErrors.xsd">https://repository.dataone.org/software/cicore/tags/D1_SCHEMA_1_1_1/dataoneErrors.xsd</a></p>
<p>differs, and so presents an inconsistent reference for implementations.</p>
<p>The Java code appears to follow the documentation, however the python implementation uses the schema to generate exception messages, and so follows the schema definition.</p>
<p>The schema and python code need to be updated to reflect the description in the documentation. Also, all implementations of MN and client software need to be informed of the issue and how they may be impacted.</p>
Infrastructure - Story #4091 (New): ESRI GeoPortal MN stackhttps://redmine.dataone.org/issues/40912013-10-15T13:36:56ZBruce Wilsonbwilso27@utk.edu
<p>The objective is to design, develop, and implement a MN Stack to integrate with the ESRI GeoPortal server (<a href="http://www.esri.com/software/arcgis/geoportal">http://www.esri.com/software/arcgis/geoportal</a>).</p>
Infrastructure - Story #4052 (In Progress): OPeNDAP MN Storyhttps://redmine.dataone.org/issues/40522013-10-06T20:07:50ZBruce Wilsonbwilso27@utk.edu
<p>There are a large number of possible DataONE MN's running Data Access Protocol (DAP) compliant servers, particularly servers based on the OPeNDAP Hyrax and UCAR THREDDS servers. The objective of this work is to develop a MN stack that can be used with at least one of these DAP-compliant software stacks as a low-barrier route to becoming a Tier 1 MN.</p>
Infrastructure - Story #3720 (New): resource maps should be validatedhttps://redmine.dataone.org/issues/37202013-04-19T20:59:36ZRob Nahfrnahf@epscor.unm.edu
<p>well-formed rdf-xml resource maps can still be unreadable by DataONE tools, which requires certain relationships to be present to be useful for DataONE. see Data Packaging architecture document (<a href="http://mule1.dataone.org/ArchitectureDocs-current/design/DataPackage.html#generating-resource-maps">http://mule1.dataone.org/ArchitectureDocs-current/design/DataPackage.html#generating-resource-maps</a>)</p>
<p>Therefore, resource maps should be validated against the DataONE requirements to prevent "silent" errors from making content undiscoverable.</p>
<p>At a minimum, validation methods should be built in d1_libclient_java, so that validation can be done prior to submission, on the MN (during MN.create/MN.update, or on the CN (during sync). </p>
<p>Additionally, facilities for interrogating the resource map to pull out relationships should be developed, using RDFS Reasoners to recover from missing inverse relationships, and existing services using resource maps (the indexer) should make use of them.</p>
<p>Development of validation services should be considered to help clients validate prior to submission.</p>
Infrastructure - Story #3656 (New): integration testing: what are acceptable pids for update?https://redmine.dataone.org/issues/36562013-03-12T22:31:53ZRob Nahfrnahf@epscor.unm.edu
<p>Update takes an originalPid as a parameter which is used to set the obsoletes and obsoletedBy fields for the two objects. Can the pid provided for that parameter be for an object that is:</p>
<p>a) archived - ?<br>
b) reserved - should be no<br>
c) deleted - ?<br>
d) "current" but located on different MN</p>
<p>d) may be difficult to test, it would happen only when the original member node stops hosting the original - is no longer the authoritative node - but is still in service.</p>
Infrastructure - Story #3591 (In Progress): Content consistency checks for new member nodeshttps://redmine.dataone.org/issues/35912013-02-18T23:24:34ZRob Nahfrnahf@epscor.unm.edu
<p>Want to be able to detect systematic content errors for new Member nodes:<br>
1) systemMetadata is parseable - existing getSystemMetadata() test should fail when it can't deserialize the systemMetadata, yes?<br>
2) resourceMaps are parseable - pull a resourceMap listObjects(formatId=, use ResourceMapFactory to deserialize <br>
3) checksum stability - content should have the same checksum every time it is pulled.</p>
Infrastructure - Story #2944 (New): Design and implement a MN kill switch mechanismhttps://redmine.dataone.org/issues/29442012-06-15T15:00:54ZDave Vieglaisdave.vieglais@gmail.com
<p>We need a mechanism to quickly and effectively de-register a MN and it's content in case there is some issue with the MN that warrants such action (e.g. malware is found on the server or some other inappropriate or malicious activity).</p>
Infrastructure - Story #2548 (New): recasting untrusted certs to public poses accessibility incon...https://redmine.dataone.org/issues/25482012-03-27T21:55:59ZRob Nahfrnahf@epscor.unm.edu
<p>KNB recasts a connection with an untrusted certificate to public, so that a client does not get "less than public" privileges.<br>
GMN throws an InvalidToken in this situation.<br>
both refuse connections from clients with expired certificates from trusted CAs.</p>
<p>This approach can cause confusion caused when the user unwittingly uses an untrusted certficate and doesn't get what they expected. If these connections were instead refused, the user would be alerted and could reconnect as a public user, if it chose.</p>
<p>brief discussion found at line 97 of : <a href="http://epad.dataone.org/20120131-authn-authz-questions">http://epad.dataone.org/20120131-authn-authz-questions</a></p>
<ul>
<li>when would honest users be in this situation?</li>
<li>elicit advantages of recasting approach</li>
<li>either way, dataone should implement uniform behavior across CN and MNs.</li>
</ul>
Infrastructure - Story #2488 (New): Changing the authoritativeMembernode will require all replica...https://redmine.dataone.org/issues/24882012-03-14T19:08:19ZRobert Waltz
<p>Modifying the authoritativeMembernode requires human intervention. A tool should be created to perform the modification and then call MNStorage.systemMetadataChanged() on all replicas of the object.</p>
DataONE API - Story #1644 (New): Develop an object format creation policyhttps://redmine.dataone.org/issues/16442011-06-14T16:25:11ZChris Jonescjones@nceas.ucsb.edu
<p>The object format list in d1_common_java is thus far an ad hoc list of known object formats needed in the D1 software. Additions will be needed. We need to develop a policy on who will have write access to the realtime version of this list, when the on-disk version will be periodically updated, etc. New object formats need to be vetted, and that process should be put into place. This process should align with the object format creation process with the UDFR group when their registry is operational.</p>
Infrastructure - Story #725 (In Progress): Create Authentication and Access control design specif...https://redmine.dataone.org/issues/7252010-08-02T21:42:24ZChad Berkleyberkley@nceas.ucsb.edu
<p>The metacat <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1/wiki/CrudService">CrudService</a> class contains methods for authentication and access control changes that are not part of the original D1 Crud specification. These services need to be decided on at a higher level and described in the specification so that they can be made to work with any D1 node, not just metacat.</p>