DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-11-01T21:37:01ZDataONE Tasks
Redmine Infrastructure - Story #8848 (New): A minor difference of annotation index between CN and MNhttps://redmine.dataone.org/issues/88482019-11-01T21:37:01ZJing Taotao@nceas.ucsb.edu
<p>The solr index on CN is:</p>
<pre><arr name="sem_annotation">
<str>http://purl.dataone.org/odo/ECSO_00000512</str>
<str>
http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#MeasurementType
</str>
<str>http://purl.dataone.org/odo/ECSO_00001102</str>
<str>http://purl.dataone.org/odo/ECSO_00001243</str>
<str>http://purl.dataone.org/odo/ECSO_00000629</str>
<str>http://purl.dataone.org/odo/ECSO_00000518</str>
<str>http://www.w3.org/2000/01/rdf-schema#Resource</str>
<str>http://purl.dataone.org/odo/ECSO_00000516</str>
<str>http://purl.obolibrary.org/obo/UO_0000301</str>
</arr>
</pre>
<p>The mn is:</p>
<pre><arr name="sem_annotation">
<str>http://purl.dataone.org/odo/ECSO_00000512</str>
<str>
http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#MeasurementType
</str>
<str>http://purl.dataone.org/odo/ECSO_00001102</str>
<str>http://purl.dataone.org/odo/ECSO_00001243</str>
<str>http://purl.dataone.org/odo/ECSO_00000629</str>
<str>http://purl.dataone.org/odo/ECSO_00000518</str>
<str>http://purl.dataone.org/odo/ECSO_00000516</str>
<str>http://purl.obolibrary.org/obo/UO_0000301</str>
</arr>
</pre>
<p>The cn has an extra <code><str>http://www.w3.org/2000/01/rdf-schema#Resource</str></code><br>
Bryce and I discussed it and thought it wouldn't affect the feature. But we still need to figure it out.</p>
Member Nodes - Story #8835 (New): Add ability for scanner to stop after a certain number of errorshttps://redmine.dataone.org/issues/88352019-08-12T19:16:39ZJohn Evans
<p>Right now the scanner will try to go thru the entire list of sitemap documents, regardless of whether they all fail or not. We should add the ability to abort further checks if a certain error threshold is crossed.</p>
Infrastructure - Story #8823 (New): Recent Apache and OpenSSL combinations break connectivity on ...https://redmine.dataone.org/issues/88232019-06-19T02:03:44ZDave Vieglaisdave.vieglais@gmail.com
<p>The latest Ubuntu 18.04 release of Apache is 2.4.29 and OpenSSL is 1.1.1.</p>
<p>This combination creates a significant delay in TLS renegotiation that results from the Apache config option on the CNs:</p>
<pre>SSLVerifyClient none
<Location "/cn">
<If " ! ( %{HTTP_USER_AGENT} =~ /(windows|chrome|mozilla|safari|webkit)/i )">
SSLVerifyClient optional
</If>
</Location>
</pre>
<p>Which is intended to disable client certificate authentication for web browsers, but allow it for others. This approach worked fine on older Apache / OpenSSL but the new combination creates a several second wait when the server discovers the client is not a web browser and tells it to reconnect with the option of including a client certificate.</p>
<p>The latest released version of Apache is 2.4.39 and this is available through a PPA intended for Debian developers. This has been installed so far on dev-2, sandbox, stage, and stage-2 with the process:</p>
<pre>sudo add-apt-repository ppa:ondrej/apache2
sudo apt update
sudo apt dist-upgrade
sudo systemctl restart apache2
</pre>
<p>This installs Apache 2.4.39 and OpenSSL 1.1.1c which appears to resolve the apparent bug in the 2.4.29 / 1.1.1 combination.</p>
<p>One issue with the update is that by default, Apache now offers TLSv1.3, which is great except that it appears to cause problems with at least Python clients failing to connect and getting a 403 error. For example:</p>
<pre>$ python3
>>> import requests
>>> r = requests.get("https://cn-sandbox-ucsb-1.test.dataone.org/cn/v2/monitor/ping")
>>> r.status_code
403
</pre>
<p>That TLSv1.3 is the problem was verified with cn-stage-unm-2 by configuring Apache with:</p>
<pre> SSLProtocol all -TLSv1.3 -SSLv2 -SSLv3
</pre>
<p>to disable TLSv1.3. After this change the Python client was able to connect as expected.</p>
<p>A workaround has not yet been researched.</p>
<p>It is not clear if this issue applies to other clients such as R and Java, so until we learn one way or the other, TLSv1.3 will be disabled on the CNs.</p>
<p>--This issue will likely apply to Member Nodes as well once TLSv1.3 is generally available or if MNs choose to install Apache 2.4.39.-- CORRECTION: this issue only applies when attempting to renegotiate TLS after headers have been transferred, so will not typically apply to a MN.</p>
Infrastructure - Story #8762 (New): Add new formats to CNhttps://redmine.dataone.org/issues/87622019-02-07T19:58:28ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>Add a new formatId for Python notebooks: application/x-ipynb+json. This is the current popular Media Type for Python notebooks (<a href="https://jupyter.readthedocs.io/en/latest/reference/mimetype.html">https://jupyter.readthedocs.io/en/latest/reference/mimetype.html</a>), but as it has the 'x-' prefix it is 'unregistered' and<br>
not recognized by IANA (<a href="https://www.iana.org/assignments/media-types/media-types.xhtml">https://www.iana.org/assignments/media-types/media-types.xhtml</a>).</p>
<p>The new entry should be:<br>
<objectFormat><br>
<formatId>application/x-ipynb+json</formatId><br>
<formatName>Jupyter Notebook</formatName><br>
<formatType>DATA</formatType><br>
<mediaType name="application/x-ipynb+json"/><br>
<extension>ipynb</extension><br>
</objectFormat></p>
Member Nodes - Story #8589 (New): ARM: Re-Discovery & Planninghttps://redmine.dataone.org/issues/85892018-05-10T20:35:03ZAmy Forresteraforres4@utk.edu
<p><strong>5/10/2018:</strong> Meeting at ORNL - Giri Prakash expressed re-interest in becoming a MN. Following on USGS_SDC GMN implementation (~Fall 2018), Aaron Stokes can turn attention to ARM installation</p>
Infrastructure - Story #8525 (In Progress): timeout exceptions thrown from Hazelcast disable sync...https://redmine.dataone.org/issues/85252018-03-27T22:36:54ZRob Nahfrnahf@epscor.unm.edu
<p>Very occasionally, synchronization disables itself when RuntimeExceptions bubble up. The most common of these is when the Hazelcast client seemingly disconnects, or can't complete an operation, and a java.util.concurrent.TimeoutException is thrown.</p>
<p>These are usually due to network problems, as evidenced by timeout exceptions appearing in both the Metacat hazelcast-storage.log files as well as d1-processing logs.</p>
<p>Temporary problems like this should be recoverable, and so a retry or bypass for those timeouts should be implemented. It's not clear whether or not a new HazelcastClient should be instantiated, or whether the same client is still usable. (Is the client tightly bound to a session, or does it recover?) If a new client is needed, preliminary searching through the code indicates that refactoring the HazelcastClientFactory.getProcessingClient() method is only used in a few places, and the singleton behavior it uses can be sidestepped by removing the method and replacing it with a getLock() wrapper method (that seems to be the dominant use case for it). See the newer SyncQueueFacade in d1_synchronization for guidance on that. If the client is never exposed, it can be refreshed as needed.</p>
<pre>root@cn-unm-1:/var/metacat/logs# grep FATAL hazelcast-storage.log.1
[FATAL] 2018-03-27 03:15:19,380 (BaseManager$2:run:1402) [64.106.40.6]:5701 [DataONE] Caught error while calling event listener; cause: [CONCURRENT_MAP_CONTAINS_KEY] Operation Timeout (with no response!): 0
</pre><pre>[ERROR] 2018-03-27 03:15:19,781 [ProcessDaemonTask1] (SyncObjectTaskManager:run:84) java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent
.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at org.dataone.cn.batch.synchronization.SyncObjectTaskManager.run(SyncObjectTaskManager.java:76)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: [CONCURRENT_MAP_REMOVE] Operation Timeout (with no response!): 0
at com.hazelcast.impl.ClientServiceException.readData(ClientServiceException.java:63)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:104)
at com.hazelcast.nio.Serializer$DataSerializer.read(Serializer.java:79)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:121)
at com.hazelcast.nio.AbstractSerializer.toObject(AbstractSerializer.java:156)
at com.hazelcast.client.ClientThreadContext.toObject(ClientThreadContext.java:72)
at com.hazelcast.client.IOUtil.toObject(IOUtil.java:34)
at com.hazelcast.client.ProxyHelper.getValue(ProxyHelper.java:186)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:146)
at com.hazelcast.client.ProxyHelper.doOp(ProxyHelper.java:140)
at com.hazelcast.client.QueueClientProxy.innerPoll(QueueClientProxy.java:115)
at com.hazelcast.client.QueueClientProxy.poll(QueueClientProxy.java:111)
at org.dataone.cn.batch.synchronization.type.SyncQueueFacade.poll(SyncQueueFacade.java:231)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:131)
at org.dataone.cn.batch.synchronization.tasks.SyncObjectTask.call(SyncObjectTask.java:73)
</pre> Member Nodes - Story #8521 (New): CAFF: Testing & Developmenthttps://redmine.dataone.org/issues/85212018-03-26T15:07:58ZAmy Forresteraforres4@utk.edu
<p>The repository and DataONE have agreed to proceed with deployment as a member node. Install or develop a functional member node to be registered to a non-production environment.</p>
Infrastructure - Story #8307 (New): Check node subject on node registration and subsequent callshttps://redmine.dataone.org/issues/83072018-02-06T20:04:39ZDave Vieglaisdave.vieglais@gmail.com
<p>The <code>/node/subject</code> entry of the node document should match the subject of the certificate used to register the node (unless the call is being made by a CN certificate).</p>
Infrastructure - Story #8204 (New): Adjust memory allocation for services running under JVM on CNshttps://redmine.dataone.org/issues/82042017-10-24T15:29:28ZDave Vieglaisdave.vieglais@gmail.com
<p>Most services on CNs run in JVM instances, each with separate restrictions on memory use.</p>
<p>Current configurations were mostly based on defaults and have not changed much despite significantly higher memory use for systems such as Solr and Hazelcast.</p>
<p>The goal of this story is to evaluate the memory configuration of each JVM instance on the CNs and tune as necessary with the primary goal of avoiding out of memory errors, secondary goal is improving performance.</p>
<p>Note that with increased heap allocation, garbage collection may become a significant bottleneck, with application freeze in the order of several minutes possible or likely with higher heap allocations (e.g. 16GB)</p>
Infrastructure - Story #8173 (New): add checks for retrograde systemMetadata changeshttps://redmine.dataone.org/issues/81732017-09-01T19:42:33ZRob Nahfrnahf@epscor.unm.edu
<p>with the ability to prioritize and the introduction of parallelized index task processing, the effective queue is not guaranteed to be time-ordered. If there are two valid system metadata changes resulting in two tasks and the second change hits the index first, the earlier task should be rejected, as its changes are out of date.</p>
Infrastructure - Story #8044 (New): certificate manager should check expiration of CAs it loads i...https://redmine.dataone.org/issues/80442017-03-14T18:28:54ZRob Nahfrnahf@epscor.unm.edu
<p>currently, libclient checks for duplicates before it adds the shipped certificates into the trustStore. It does not check for expiration date. Evaluate whether it should. (Would our shipped set be more up-to-date than the java runtime?</p>
Infrastructure - Story #7859 (New): Add formatID for the STL 3d model file formathttps://redmine.dataone.org/issues/78592016-08-04T19:02:58ZBryce Mecummecum@nceas.ucsb.edu
<p>The STL file format is a domain standard file format for storing 3d models and is the most common way I've managed 3d models used while 3d printing. Given that 3d printing is seeing increased usage in the sciences, I would say this is a good candidate for inclusion in the controlled list of format ids.</p>
<p>Type: DATA<br>
Id: STL<br>
Name: StereoLithography File Format<br>
Media type: application/sla (unofficial)<br>
Extension: .stl</p>
<p>There is an ASCII form and a Binary form of this format. They don't see to be distinguished according to any standard. What do we do in this case?</p>
<p>References: <br>
- <a href="https://en.wikipedia.org/wiki/STL_(file_format)">https://en.wikipedia.org/wiki/STL_(file_format)</a><br>
- <a href="https://reference.wolfram.com/language/ref/format/STL.html">https://reference.wolfram.com/language/ref/format/STL.html</a></p>
Java Client - Story #6850 (New): automate Java Client releases https://redmine.dataone.org/issues/68502015-02-11T20:33:42ZRob Nahfrnahf@epscor.unm.edu
<p>We would like the self-contained jars for the Java Client Libraries to be built and automatically deployed from jenkins to releases.dataone.org.</p>
Infrastructure - Story #4650 (New): Allow MN to bias resolve to the authoritative MNhttps://redmine.dataone.org/issues/46502014-03-28T18:17:40ZBruce Wilsonbwilso27@utk.edu
<p>In the MN workshop (IDCC, Feb 2014) and in a MN forum discussion, several MN's asked if the resolve could be biased so that the authoritative MN was the somehow biased, so that users would be more likely to retrieve the data from the authoritative MN. A suggestion that met with a very positive response was to ensure that CN.resolve() returns the authoritative MN as the first item in the list. My understanding is that the order in the CN.resolve() results is indeterminate. </p>
Infrastructure - Story #2488 (New): Changing the authoritativeMembernode will require all replica...https://redmine.dataone.org/issues/24882012-03-14T19:08:19ZRobert Waltz
<p>Modifying the authoritativeMembernode requires human intervention. A tool should be created to perform the modification and then call MNStorage.systemMetadataChanged() on all replicas of the object.</p>