DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2020-04-23T16:24:46ZDataONE Tasks
Redmine Infrastructure - Story #8862 (New): Deploy a new dataone-cn-rest releasehttps://redmine.dataone.org/issues/88622020-04-23T16:24:46ZJing Taotao@nceas.ucsb.edu
<p>We have a new d1_portal jar release which addresses the issue that restarting tomcat in CNs is needed when the LE certificates are renewed in CNs. The new d1_portal jar file has been deployed to dataone-cn-portal. However, the component dataone-cn-rest was overlooked. We need to deploy it there as well.<br>
Yesterday, we did a hack fix in CNs when we restarted tomcat - dropped the d1_portal-2.3.2.jar file there. So now it should work. But we still need a formal release.</p>
Infrastructure - Story #8853 (New): Make cn.resolve smarterhttps://redmine.dataone.org/issues/88532019-11-15T16:46:12ZJing Taotao@nceas.ucsb.edu
<p>In this case the cn.resolve() operation should be ignoring the node that is marked as offline, or at least placing it last in the list.</p>
<p>This should be a high priority fix, and should be fairly simple to implement since the information is available in the node document.</p>
<ul>
<li>Dave</li>
</ul>
<blockquote>
<p>On 2019-11-14, at 21:38, Matt Jones <a href="mailto:jones@nceas.ucsb.edu">jones@nceas.ucsb.edu</a> wrote:</p>
<p>FYI, thread form today with Ethan White on ebird replication, and the resolve() api in DataONE. Relates to our conversation today about making resolve() and MetacatUI downloads smarter.</p>
<p>Matt</p>
<p>Ethan White 5:06 PM<br>
What's the right place to report data that if 404ing on DataONE?</p>
<p>Matt Jones 5:07 PM<br>
<a href="mailto:support@dataone.org">support@dataone.org</a> would work</p>
<p>5:08 PM<br>
or let me know</p>
<p>5:08 PM<br>
is it that same data set?</p>
<p>5:08 PM<br>
the Ebird one?</p>
<p>Ethan White 5:09 PM<br>
Yeah, which we had discovered had been reposted and spent a bunch of time gearing up to support again. We were in the middle of testing when it suddenly disappeared again. <a href="http://dataone.ornith.cornell.edu/metacat/d1/mn/v2/object/EOD_CLO_2016.csv.gz">http://dataone.ornith.cornell.edu/metacat/d1/mn/v2/object/EOD_CLO_2016.csv.gz</a></p>
<p>Matt Jones 5:10 PM<br>
yeah. Cornell just gave us permission to replicate the data to other nodes. They haven’t wanted us to do so in the past.</p>
<p>Ethan White 5:13 PM<br>
Thanks. That's good news. So can we expect it to reappear at some point soonish?</p>
<p>Matt Jones 5:14 PM<br>
Yeah, its been replicated. I’m checking to see if it is properly linked to the original.</p>
<p>5:15 PM<br>
<a href="https://knb.ecoinformatics.org/view/EOD_CLO_2016.eml">https://knb.ecoinformatics.org/view/EOD_CLO_2016.eml</a></p>
<p>new messages</p>
<p>Ethan White 5:16 PM<br>
Thanks Matt. FYI that link I posted is the one being returned from a current search of DataONE.</p>
<p>Matt Jones 5:17 PM<br>
Yeah. Because that’s the ‘authoritative’ copy at cornell.</p>
<p>5:17 PM<br>
but Cornell’s node has been going up and down.</p>
<p>5:17 PM<br>
our resolve service lists all copies of a data set</p>
<p>5:17 PM<br>
so if one is down, you can get it from another location:</p>
<p>5:18 PM<br>
<code><br>
$ curl -H "Accept: text/xml" https://cn.dataone.org/cn/v2/resolve/EOD_CLO_2016.eml<br>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><br>
<ns2:objectLocationList xmlns:ns2="http://ns.dataone.org/service/types/v1"><br>
<identifier>EOD_CLO_2016.eml</identifier><br>
<objectLocation><br>
<nodeIdentifier>urn:node:CLOEBIRD</nodeIdentifier><br>
<baseURL>http://dataone.ornith.cornell.edu/metacat/d1/mn</baseURL><br>
<version>v1</version><br>
<version>v2</version><br>
<url>http://dataone.ornith.cornell.edu/metacat/d1/mn/v2/object/EOD_CLO_2016.eml</url><br>
</objectLocation><br>
<objectLocation><br>
<nodeIdentifier>urn:node:CN</nodeIdentifier><br>
<baseURL>https://cn.dataone.org/cn</baseURL><br>
<version>v1</version><br>
<version>v2</version><br>
<url>https://cn.dataone.org/cn/v2/object/EOD_CLO_2016.eml</url><br>
</objectLocation><br>
<objectLocation><br>
<nodeIdentifier>urn:node:KNB</nodeIdentifier><br>
<baseURL>https://knb.ecoinformatics.org/knb/d1/mn</baseURL><br>
<version>v1</version><br>
<version>v2</version><br>
<url>https://knb.ecoinformatics.org/knb/d1/mn/v2/object/EOD_CLO_2016.eml</url><br>
</objectLocation><br>
</ns2:objectLocationList><br>
</code></p>
<p>Ethan White 5:19 PM<br>
OK, thanks. That's why I thought the link in DataONE <a href="https://cn.dataone.org/cn/v2/resolve/EOD_CLO_2016.csv.gz">https://cn.dataone.org/cn/v2/resolve/EOD_CLO_2016.csv.gz</a> would take me to a working version, but clearly I just don't understand the details. We'll just use the the one on KNB at least for the moment. Really appreciate your help as always.</p>
<p>Matt Jones 5:20 PM<br>
No problem. I’d love to make this all work more seamlessly. (edited) </p>
<p>5:20 PM<br>
So suggestions definitely welcome.</p>
<p>5:21 PM<br>
I expect Cornell to take their node offline altogether — so the KNB will likely be the better location.</p>
<p>5:22 PM<br>
Btw, the resolve link when executed in a browser just redirects to the first copy</p>
<p>Ethan White 5:23 PM<br>
Yeah, Cornell's closed approach to things is a pretty big disappointment, especially on data like this that is generated by volunteers. We'll just go to the KNB version permanently.</p>
<p>Matt Jones 5:23 PM<br>
whereas programatically you get the list of locations</p>
<p>5:23 PM<br>
if you ask for XML</p>
<p>Ethan White 5:23 PM<br>
That makes sense. Thanks.</p>
<p>Matt Jones 5:23 PM<br>
and then you can choose to try one or more</p>
</blockquote>
Infrastructure - Story #8849 (New): During sync, the CN does not detect error returned from getCh...https://redmine.dataone.org/issues/88492019-11-05T19:25:48ZRoger Dahldahl@unm.edu
<p>Due to a bug, GMN returned 500 on some getChecksum() calls. The CN did not detect the 500 return status and proceeded with the sync, using "null" as the checksum.</p>
Infrastructure - Story #8848 (New): A minor difference of annotation index between CN and MNhttps://redmine.dataone.org/issues/88482019-11-01T21:37:01ZJing Taotao@nceas.ucsb.edu
<p>The solr index on CN is:</p>
<pre><arr name="sem_annotation">
<str>http://purl.dataone.org/odo/ECSO_00000512</str>
<str>
http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#MeasurementType
</str>
<str>http://purl.dataone.org/odo/ECSO_00001102</str>
<str>http://purl.dataone.org/odo/ECSO_00001243</str>
<str>http://purl.dataone.org/odo/ECSO_00000629</str>
<str>http://purl.dataone.org/odo/ECSO_00000518</str>
<str>http://www.w3.org/2000/01/rdf-schema#Resource</str>
<str>http://purl.dataone.org/odo/ECSO_00000516</str>
<str>http://purl.obolibrary.org/obo/UO_0000301</str>
</arr>
</pre>
<p>The mn is:</p>
<pre><arr name="sem_annotation">
<str>http://purl.dataone.org/odo/ECSO_00000512</str>
<str>
http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#MeasurementType
</str>
<str>http://purl.dataone.org/odo/ECSO_00001102</str>
<str>http://purl.dataone.org/odo/ECSO_00001243</str>
<str>http://purl.dataone.org/odo/ECSO_00000629</str>
<str>http://purl.dataone.org/odo/ECSO_00000518</str>
<str>http://purl.dataone.org/odo/ECSO_00000516</str>
<str>http://purl.obolibrary.org/obo/UO_0000301</str>
</arr>
</pre>
<p>The cn has an extra <code><str>http://www.w3.org/2000/01/rdf-schema#Resource</str></code><br>
Bryce and I discussed it and thought it wouldn't affect the feature. But we still need to figure it out.</p>
Infrastructure - Story #8842 (New): Some exceptions in Metacathttps://redmine.dataone.org/issues/88422019-09-19T17:53:22ZJing Taotao@nceas.ucsb.edu
<p>In sandbox, we see some exceptions like. It appears not to hurt function, but we need to take a look at it.<br>
<code><br>
9-Sep-2019 15:19:05.303 INFO [localhost-startStop-1] org.apache.catalina.core.ApplicationContext.log Marking servlet [AxisServlet] as unavailable<br>
19-Sep-2019 15:19:05.304 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.loadOnStartup Servlet [AxisServlet] in web application [/metacat] threw load() exception<br>
java.lang.ClassNotFoundException: org.apache.axis.transport.http.AxisServlet<br>
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1364)<br>
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1185)<br>
at org.apache.catalina.core.DefaultInstanceManager.loadClass(DefaultInstanceManager.java:546)<br>
at org.apache.catalina.core.DefaultInstanceManager.loadClassMaybePrivileged(DefaultInstanceManager.java:527)<br>
at org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:150)<br>
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1044)<br>
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:983)<br>
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4956)<br>
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5270)<br>
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)<br>
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)<br>
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)<br>
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)<br>
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:624)<br>
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1834)<br>
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)<br>
at java.util.concurrent.FutureTask.run(FutureTask.java:266)<br>
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)<br>
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)<br>
at java.lang.Thread.run(Thread.java:748)<br>
</code></p>
Member Nodes - Story #8835 (New): Add ability for scanner to stop after a certain number of errorshttps://redmine.dataone.org/issues/88352019-08-12T19:16:39ZJohn Evans
<p>Right now the scanner will try to go thru the entire list of sitemap documents, regardless of whether they all fail or not. We should add the ability to abort further checks if a certain error threshold is crossed.</p>
Member Nodes - Story #8833 (New): Problems utilizing pyshacl within SlenderNodeshttps://redmine.dataone.org/issues/88332019-08-05T19:37:09ZJohn Evans
<p>Opening this ticket to document issues encountered when trying to utilize pyshacl within SlenderNodes.</p>
Member Nodes - Story #8832 (New): Conflict between pyshacl and owlr on ubuntuhttps://redmine.dataone.org/issues/88322019-08-05T19:30:42ZJohn Evans
<p>This error arises when trying to test on gmn-multihost-test (ubuntu 18.04.2 LTS).<br><br>
~~~<br>
Traceback (most recent call last):<br>
File "/home/gmn/.pyenv/versions/schema_org/bin/d1-check-site", line 11, in <br>
load_entry_point('schema-org==4.0.0', 'console_scripts', 'd1-check-site')()<br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/pkg_resources/<strong>init</strong>.py", line 489, in load_entry_point<br>
return get_distribution(dist).load_entry_point(group, name)<br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/pkg_resources/<strong>init</strong>.py", line 2793, in load_entry_point<br>
return ep.load()<br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/pkg_resources/<strong>init</strong>.py", line 2411, in load<br>
return self.resolve()<br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/pkg_resources/<strong>init</strong>.py", line 2417, in resolve<br>
module = <strong>import</strong>(self.module_name, fromlist=['<strong>name</strong>'], level=0)<br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/<strong>init</strong>.py", line 1, in <br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/arm.py", line 9, in <br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/common.py", line 24, in <br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/jsonld_validator.py", line 10, in <br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/pyshacl/<strong>init</strong>.py", line 3, in <br>
from pyshacl.validate import validate, Validator<br>
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/pyshacl/validate.py", line 5, in <br>
import owlrl<br>
File "/home/gmn/.pyenv/versions/3.7.4/envs/schema_org/bin/owlrl.py", line 4, in <br>
from owlrl import convert_graph, RDFXML, TURTLE, JSON, AUTO, RDFA<br>
ImportError: cannot import name 'convert_graph' from 'owlrl' (/home/gmn/.pyenv/versions/3.7.4/envs/schema_org/bin/owlrl.py)<br>
Worker completed with exit code: 1<br>
~~~</p>
Member Nodes - Story #8831 (New): Import error with schema_org.datahttps://redmine.dataone.org/issues/88312019-08-05T18:57:10ZJohn Evans
<pre>(schema_org) gmn@gmn-multihost-test:18:53:56:/var/local/dataone/schema_org_scan/commandline_evans/SlenderNodes/schema_org/src$ d1-check-site http://104.236.112.76/demo/metadata-document-does-not-validate/sitemap.xml
Traceback (most recent call last):
File "/home/gmn/.pyenv/versions/schema_org/bin/d1-check-site", line 11, in <module>
load_entry_point('schema-org==4.0.0', 'console_scripts', 'd1-check-site')()
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/commandline.py", line 91, in d1_check_site
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/testtool.py", line 12, in __init__
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/common.py", line 127, in __init__
File "/home/gmn/.pyenv/versions/schema_org/lib/python3.7/site-packages/schema_org-4.0.0-py3.7.egg/schema_org/jsonld_validator.py", line 54, in __init__
File "/home/gmn/.pyenv/versions/3.7.4/lib/python3.7/importlib/resources.py", line 168, in read_text
package = _get_package(package)
File "/home/gmn/.pyenv/versions/3.7.4/lib/python3.7/importlib/resources.py", line 47, in _get_package
module = import_module(package)
File "/home/gmn/.pyenv/versions/3.7.4/lib/python3.7/importlib/__init__.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
File "<frozen importlib._bootstrap>", line 983, in _find_and_load
File "<frozen importlib._bootstrap>", line 965, in _find_and_load_unlocked
ModuleNotFoundError: No module named 'schema_org.data'
</pre> Infrastructure - Story #8823 (New): Recent Apache and OpenSSL combinations break connectivity on ...https://redmine.dataone.org/issues/88232019-06-19T02:03:44ZDave Vieglaisdave.vieglais@gmail.com
<p>The latest Ubuntu 18.04 release of Apache is 2.4.29 and OpenSSL is 1.1.1.</p>
<p>This combination creates a significant delay in TLS renegotiation that results from the Apache config option on the CNs:</p>
<pre>SSLVerifyClient none
<Location "/cn">
<If " ! ( %{HTTP_USER_AGENT} =~ /(windows|chrome|mozilla|safari|webkit)/i )">
SSLVerifyClient optional
</If>
</Location>
</pre>
<p>Which is intended to disable client certificate authentication for web browsers, but allow it for others. This approach worked fine on older Apache / OpenSSL but the new combination creates a several second wait when the server discovers the client is not a web browser and tells it to reconnect with the option of including a client certificate.</p>
<p>The latest released version of Apache is 2.4.39 and this is available through a PPA intended for Debian developers. This has been installed so far on dev-2, sandbox, stage, and stage-2 with the process:</p>
<pre>sudo add-apt-repository ppa:ondrej/apache2
sudo apt update
sudo apt dist-upgrade
sudo systemctl restart apache2
</pre>
<p>This installs Apache 2.4.39 and OpenSSL 1.1.1c which appears to resolve the apparent bug in the 2.4.29 / 1.1.1 combination.</p>
<p>One issue with the update is that by default, Apache now offers TLSv1.3, which is great except that it appears to cause problems with at least Python clients failing to connect and getting a 403 error. For example:</p>
<pre>$ python3
>>> import requests
>>> r = requests.get("https://cn-sandbox-ucsb-1.test.dataone.org/cn/v2/monitor/ping")
>>> r.status_code
403
</pre>
<p>That TLSv1.3 is the problem was verified with cn-stage-unm-2 by configuring Apache with:</p>
<pre> SSLProtocol all -TLSv1.3 -SSLv2 -SSLv3
</pre>
<p>to disable TLSv1.3. After this change the Python client was able to connect as expected.</p>
<p>A workaround has not yet been researched.</p>
<p>It is not clear if this issue applies to other clients such as R and Java, so until we learn one way or the other, TLSv1.3 will be disabled on the CNs.</p>
<p>--This issue will likely apply to Member Nodes as well once TLSv1.3 is generally available or if MNs choose to install Apache 2.4.39.-- CORRECTION: this issue only applies when attempting to renegotiate TLS after headers have been transferred, so will not typically apply to a MN.</p>
Member Nodes - Story #8819 (New): IEDA documents not in DataONEhttps://redmine.dataone.org/issues/88192019-06-13T19:20:29ZJohn Evans
<p>While testing schema_org adapters for ARM and IEDA, I noticed that 9 documents found in the IEDA site map have malformed JSON-LD script elements. These nine documents do not appear in DataOne when using search. There are two issues arising in the malformed JSON-LD:</p>
<ol>
<li>description values have over-escaped double-quotes, i.e. \"bipolar seesaw\" rather than \"bipolar seesaw\". See IEDA documents with IDs 601015, 601098, e.g. <a href="http://get.iedadata.org/metadata/iso/601015">http://get.iedadata.org/metadata/iso/601015</a></li>
<li>description values have embedded double-quotes that are not properly escaped at all. See IEDA documents with IDs 600165, 601033, 601076, 601077, 601089, 601103, 601178</li>
</ol>
<p>Fixing the over-escaped double-quotes is easy to do on the fly, but the under-escaped double-quotes requires taking a bit more care (still doable on the fly).</p>
Infrastructure - Story #8806 (New): Cleanup from OS upgradeshttps://redmine.dataone.org/issues/88062019-05-21T12:45:17ZDave Vieglaisdave.vieglais@gmail.com
<p>There's a few items that need to be addressed after the OS upgrades from 14.04 to 18.04.</p>
Infrastructure - Story #8796 (New): Various issues with service access after upgrade to 18.04https://redmine.dataone.org/issues/87962019-05-14T23:57:48ZDave Vieglaisdave.vieglais@gmail.com
<p>Users have reported some issues with CNs after upgrades to 18.04. See individual issues for details.</p>
CN REST - Story #8749 (New): Fix log aggregation events from the CN without associated CN IPshttps://redmine.dataone.org/issues/87492018-11-16T20:39:55ZChris Jonescjones@nceas.ucsb.edu
<p>The robots list used to filter out usage events includes the IP addresses of the CNs, so events logged during synchronization don't show up as true hits. Because of the SSL infrastructure at lbl.gov, the ESS-DIVE group doesn't see the public IP of an incoming request, but rather an internal private IP assigned by lbl.gov infrastructure. You can see the impact of this on the <a href="https://data.ess-dive.lbl.gov/#profile" class="external">ESS-DIVE profile page</a>. The spike of 11,000+ downloads in August 2018 was the CN synchronizing content.</p>
<p>Rushiraj summarized these events in a <a href="https://gist.github.com/rushirajnenuji/847d8239acf68a108bda30e04af0406b" class="external">gist</a></p>
<p>There are multiple <code>10.42.x.x</code> IP associated with the CN requests. These events all need to be updated in the <code>logsolr</code> core and changed to an actual CN IP. For future synchronizations, perhaps we need to add <code>10.42.0.0/16</code> to the robots list? </p>
Member Nodes - Story #8683 (New): USGS SDC: redeploy as a v2 Slender Node with GMNhttps://redmine.dataone.org/issues/86832018-08-22T16:25:08ZAmy Forresteraforres4@utk.edu