DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-06-19T02:03:44ZDataONE Tasks
Redmine Infrastructure - Story #8823 (New): Recent Apache and OpenSSL combinations break connectivity on ...https://redmine.dataone.org/issues/88232019-06-19T02:03:44ZDave Vieglaisdave.vieglais@gmail.com
<p>The latest Ubuntu 18.04 release of Apache is 2.4.29 and OpenSSL is 1.1.1.</p>
<p>This combination creates a significant delay in TLS renegotiation that results from the Apache config option on the CNs:</p>
<pre>SSLVerifyClient none
<Location "/cn">
<If " ! ( %{HTTP_USER_AGENT} =~ /(windows|chrome|mozilla|safari|webkit)/i )">
SSLVerifyClient optional
</If>
</Location>
</pre>
<p>Which is intended to disable client certificate authentication for web browsers, but allow it for others. This approach worked fine on older Apache / OpenSSL but the new combination creates a several second wait when the server discovers the client is not a web browser and tells it to reconnect with the option of including a client certificate.</p>
<p>The latest released version of Apache is 2.4.39 and this is available through a PPA intended for Debian developers. This has been installed so far on dev-2, sandbox, stage, and stage-2 with the process:</p>
<pre>sudo add-apt-repository ppa:ondrej/apache2
sudo apt update
sudo apt dist-upgrade
sudo systemctl restart apache2
</pre>
<p>This installs Apache 2.4.39 and OpenSSL 1.1.1c which appears to resolve the apparent bug in the 2.4.29 / 1.1.1 combination.</p>
<p>One issue with the update is that by default, Apache now offers TLSv1.3, which is great except that it appears to cause problems with at least Python clients failing to connect and getting a 403 error. For example:</p>
<pre>$ python3
>>> import requests
>>> r = requests.get("https://cn-sandbox-ucsb-1.test.dataone.org/cn/v2/monitor/ping")
>>> r.status_code
403
</pre>
<p>That TLSv1.3 is the problem was verified with cn-stage-unm-2 by configuring Apache with:</p>
<pre> SSLProtocol all -TLSv1.3 -SSLv2 -SSLv3
</pre>
<p>to disable TLSv1.3. After this change the Python client was able to connect as expected.</p>
<p>A workaround has not yet been researched.</p>
<p>It is not clear if this issue applies to other clients such as R and Java, so until we learn one way or the other, TLSv1.3 will be disabled on the CNs.</p>
<p>--This issue will likely apply to Member Nodes as well once TLSv1.3 is generally available or if MNs choose to install Apache 2.4.39.-- CORRECTION: this issue only applies when attempting to renegotiate TLS after headers have been transferred, so will not typically apply to a MN.</p>
Infrastructure - Story #8525 (In Progress): timeout exceptions thrown from Hazelcast disable sync...https://redmine.dataone.org/issues/85252018-03-27T22:36:54ZRob Nahfrnahf@epscor.unm.edu
<p>Very occasionally, synchronization disables itself when RuntimeExceptions bubble up. The most common of these is when the Hazelcast client seemingly disconnects, or can't complete an operation, and a java.util.concurrent.TimeoutException is thrown.</p>
<p>These are usually due to network problems, as evidenced by timeout exceptions appearing in both the Metacat hazelcast-storage.log files as well as d1-processing logs.</p>
<p>Temporary problems like this should be recoverable, and so a retry or bypass for those timeouts should be implemented. It's not clear whether or not a new HazelcastClient should be instantiated, or whether the same client is still usable. (Is the client tightly bound to a session, or does it recover?) If a new client is needed, preliminary searching through the code indicates that refactoring the HazelcastClientFactory.getProcessingClient() method is only used in a few places, and the singleton behavior it uses can be sidestepped by removing the method and replacing it with a getLock() wrapper method (that seems to be the dominant use case for it). See the newer SyncQueueFacade in d1_synchronization for guidance on that. If the client is never exposed, it can be refreshed as needed.</p>
<pre>root@cn-unm-1:/var/metacat/logs# grep FATAL hazelcast-storage.log.1
[FATAL] 2018-03-27 03:15:19,380 (BaseManager$2:run:1402) [64.106.40.6]:5701 [DataONE] Caught error while calling event listener; cause: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
</pre><pre>[ERROR] 2018-03-27 03:15:19,781 [ProcessDaemonTask1] (SyncObjectTaskManager:run:84) java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent
.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.dataone.cn.batch.synchronization.SyncObjectTaskManager.run(SyncObjectTaskManager.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.QueueClientProxy.innerPoll(QueueClientProxy.java:115)
at com.hazelcast.client.QueueClientProxy.poll(QueueClientProxy.java:111)
at org.dataone.cn.batch.synchronization.type.SyncQueueFacade.poll(SyncQueueFacade.java:231)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:131)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:73)
</pre> Infrastructure - Story #8173 (New): add checks for retrograde systemMetadata changeshttps://redmine.dataone.org/issues/81732017-09-01T19:42:33ZRob Nahfrnahf@epscor.unm.edu
<p>with the ability to prioritize and the introduction of parallelized index task processing, the effective queue is not guaranteed to be time-ordered. If there are two valid system metadata changes resulting in two tasks and the second change hits the index first, the earlier task should be rejected, as its changes are out of date.</p>
Infrastructure - Story #7640 (New): ONEShare failing with ServiceFailurehttps://redmine.dataone.org/issues/76402016-02-12T16:39:09ZRobert Waltz
<p>When attempting to retrieve an object from ONEShare or to retrieve log entries, the following ServiceFailure from the ONEShare is reported:</p>
<p>class javax.net.ssl.SSLPeerUnverifiedException: peer not authenticated</p>
<p>here are two example urls that fail with curl using the CN certificate for authentication.</p>
<p><a href="https://oneshare.unm.edu/knb/d1/mn/v1/object/ark:%2Fc5146%2Fr3pp45%2F3%2Fmrt-datacite.xml">https://oneshare.unm.edu/knb/d1/mn/v1/object/ark:%2Fc5146%2Fr3pp45%2F3%2Fmrt-datacite.xml</a></p>
<p><a href="https://oneshare.unm.edu/knb/d1/mn/v1/meta/ark:%2Fc5146%2Fr3pp45%2F3%2Fmrt-datacite.xml">https://oneshare.unm.edu/knb/d1/mn/v1/meta/ark:%2Fc5146%2Fr3pp45%2F3%2Fmrt-datacite.xml</a>"</p>
<p>So, both get and getSystemMetadata fail for the pid ark:/c5146/r3pp45/3/mrt-datacite.xml</p>
Infrastructure - Story #7611 (New): ORNLDAAC fails during log aggregationhttps://redmine.dataone.org/issues/76112016-01-26T16:35:10ZRobert Waltz
<p>ORNLDAAC is able to respond to the following rest call:</p>
<p><a href="http://mercury-ops2.ornl.gov/ornldaac/mn/v1/log?fromDate=2015-12-02T20:26:54.001%2B00:00&toDate=2016-01-26T00:00:00.000%2B00:00&start=0&count=0">http://mercury-ops2.ornl.gov/ornldaac/mn/v1/log?fromDate=2015-12-02T20:26:54.001%2B00:00&toDate=2016-01-26T00:00:00.000%2B00:00&start=0&count=0</a></p>
<p>The response is</p>
<p>So, if ORNLDAAC is queried to not return any event records, but just the total available records over a time period, then the MN response just fine.</p>
<p>However, if ORNLDAAC is queried to return event records for the same timeframe:</p>
<p><a href="http://mercury-ops2.ornl.gov/ornldaac/mn/v1/log?fromDate=2015-12-02T20:26:54.001%2B00:00&toDate=2016-01-26T00:00:00.000%2B00:00&start=0&count=1000">http://mercury-ops2.ornl.gov/ornldaac/mn/v1/log?fromDate=2015-12-02T20:26:54.001%2B00:00&toDate=2016-01-26T00:00:00.000%2B00:00&start=0&count=1000</a></p>
<p>then the MN responds with an "Internal Server Error" that is transmitted in HTML.</p>
Infrastructure - Story #7183 (New): Update wild card server certificate on all test.dataone.org s...https://redmine.dataone.org/issues/71832015-06-15T14:54:13ZDave Vieglaisdave.vieglais@gmail.com
<p>The *.test.dataone.org server certificate expires in July.</p>
<p>A replacement has been ordered and will be stored in subversion:</p>
<p>AdminAccounts.txt</p>
<p>The servers impacted are listed on the Google Sheet:</p>
<p><a href="https://docs.google.com/spreadsheets/d/1BrZgm0yPV9dzd6SIfjQ9P5W5WH666xaGAI6KeXTAiFs/edit#gid=0">https://docs.google.com/spreadsheets/d/1BrZgm0yPV9dzd6SIfjQ9P5W5WH666xaGAI6KeXTAiFs/edit#gid=0</a></p>
OGC-Slender Node - Story #7170 (New): Evaluate the feasibility of extracting provenance informati...https://redmine.dataone.org/issues/71702015-06-08T19:03:58ZDave Vieglaisdave.vieglais@gmail.comOGC-Slender Node - Story #7166 (New): Create ORE document for an NODC data packagehttps://redmine.dataone.org/issues/71662015-06-08T18:49:41ZDave Vieglaisdave.vieglais@gmail.com
<p>Each accession is treated as a data package. Need to generate a resource map document that describes the contents of an accession.</p>
OGC-Slender Node - Story #7151 (In Progress): Generate system metadata for contenthttps://redmine.dataone.org/issues/71512015-06-04T20:24:16ZDave Vieglaisdave.vieglais@gmail.com
<p>For the three types of content (resource, science meta, data), develop a mechanism to provide system metadata.</p>
OGC-Slender Node - Story #7149 (Testing): Implement mechanism to retrieve a list of objects avail...https://redmine.dataone.org/issues/71492015-06-04T20:20:47ZDave Vieglaisdave.vieglais@gmail.com
<p>Using Python, implement a tool that is able to retrieve a list of packages, and the objects that make up each package.</p>
OGC-Slender Node - Story #7146 (In Progress): Determine formatId for content retrievable from NOD...https://redmine.dataone.org/issues/71462015-06-04T20:15:08ZDave Vieglaisdave.vieglais@gmail.com
<p>In order to create system metadata for objects held by NODC, it is necessary to infer the appropriate formatId for each object.</p>
DataONE API - Story #6759 (New): ObjectFormat Managementhttps://redmine.dataone.org/issues/67592015-01-13T20:12:14ZRob Nahfrnahf@epscor.unm.edu
<p>There currently are not any API methods for managing the collection of objectFormats registered to a dataone environment. There is a "bootstrap" resource that constitutes a the list in either d1_libclient_java or d1_common_java that can be used in testing environments. There's also a different resource in the cn-os-core project that is used in production.</p>
<p>These 2 resources are difficult to maintain (keep synchronized), and there isn't a defined process for adding formats to production.</p>
<p>We discussed the inclusion of an "addFormat(...) method in V2, but it is not currently in the API. (It would be part of the CNCore API).</p>
<p>It would be good to review the situation with a focused discussion to at least troubleshoot the existing informal management practices and formalize them; and then consider if more infrastructure is needed.</p>
Infrastructure - Story #6069 (New): open ask.dataone.org sign-in to communityhttps://redmine.dataone.org/issues/60692014-08-22T16:45:27ZRob Nahfrnahf@epscor.unm.edu
<p>We are starting to get community involvement in questions asked on DataONE, but community members are not able to sign-in and post responses, I believe.</p>
<p>the solution is to open up the Askbot server to accept new members from the general community.</p>
Infrastructure - Story #4091 (New): ESRI GeoPortal MN stackhttps://redmine.dataone.org/issues/40912013-10-15T13:36:56ZBruce Wilsonbwilso27@utk.edu
<p>The objective is to design, develop, and implement a MN Stack to integrate with the ESRI GeoPortal server (<a href="http://www.esri.com/software/arcgis/geoportal">http://www.esri.com/software/arcgis/geoportal</a>).</p>
Infrastructure - Story #4052 (In Progress): OPeNDAP MN Storyhttps://redmine.dataone.org/issues/40522013-10-06T20:07:50ZBruce Wilsonbwilso27@utk.edu
<p>There are a large number of possible DataONE MN's running Data Access Protocol (DAP) compliant servers, particularly servers based on the OPeNDAP Hyrax and UCAR THREDDS servers. The objective of this work is to develop a MN stack that can be used with at least one of these DAP-compliant software stacks as a low-barrier route to becoming a Tier 1 MN.</p>