DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-11-22T18:31:19ZDataONE Tasks
Redmine Infrastructure - Story #8856 (New): Put the system metadata part ahead of the object part when d1...https://redmine.dataone.org/issues/88562019-11-22T18:31:19ZJing Taotao@nceas.ucsb.edu
<p>When a client calls the mn.create/update methods, it usually constructs a multipart which contains the sys part (containing the system metadata information), object part (containing the object itself) and other parts. There is no requirement about the order of those parts.<br>
Metacat will use a new streaming multipart handler which will calculate the checksum when it stores the object part into a file. This requires we should know the checksum algorithm before the serialization of the object. So Metacat has to digest the system metadata first in order to improve the performance.<br>
In order to take the advantage, we recommend clients should put the system metadata part ahead of the object part when it is constructing the multipart to be sent to the server.<br>
Note: event though the client doesn't use the recommended order, the process still works but the performance will be poor.</p>
Infrastructure - Story #8853 (New): Make cn.resolve smarterhttps://redmine.dataone.org/issues/88532019-11-15T16:46:12ZJing Taotao@nceas.ucsb.edu
<p>In this case the cn.resolve() operation should be ignoring the node that is marked as offline, or at least placing it last in the list.</p>
<p>This should be a high priority fix, and should be fairly simple to implement since the information is available in the node document.</p>
<ul>
<li>Dave</li>
</ul>
<blockquote>
<p>On 2019-11-14, at 21:38, Matt Jones <a href="mailto:jones@nceas.ucsb.edu">jones@nceas.ucsb.edu</a> wrote:</p>
<p>FYI, thread form today with Ethan White on ebird replication, and the resolve() api in DataONE. Relates to our conversation today about making resolve() and MetacatUI downloads smarter.</p>
<p>Matt</p>
<p>Ethan White 5:06 PM<br>
What's the right place to report data that if 404ing on DataONE?</p>
<p>Matt Jones 5:07 PM<br>
<a href="mailto:support@dataone.org">support@dataone.org</a> would work</p>
<p>5:08 PM<br>
or let me know</p>
<p>5:08 PM<br>
is it that same data set?</p>
<p>5:08 PM<br>
the Ebird one?</p>
<p>Ethan White 5:09 PM<br>
Yeah, which we had discovered had been reposted and spent a bunch of time gearing up to support again. We were in the middle of testing when it suddenly disappeared again. <a href="http://dataone.ornith.cornell.edu/metacat/d1/mn/v2/object/EOD_CLO_2016.csv.gz">http://dataone.ornith.cornell.edu/metacat/d1/mn/v2/object/EOD_CLO_2016.csv.gz</a></p>
<p>Matt Jones 5:10 PM<br>
yeah. Cornell just gave us permission to replicate the data to other nodes. They haven’t wanted us to do so in the past.</p>
<p>Ethan White 5:13 PM<br>
Thanks. That's good news. So can we expect it to reappear at some point soonish?</p>
<p>Matt Jones 5:14 PM<br>
Yeah, its been replicated. I’m checking to see if it is properly linked to the original.</p>
<p>5:15 PM<br>
<a href="https://knb.ecoinformatics.org/view/EOD_CLO_2016.eml">https://knb.ecoinformatics.org/view/EOD_CLO_2016.eml</a></p>
<p>new messages</p>
<p>Ethan White 5:16 PM<br>
Thanks Matt. FYI that link I posted is the one being returned from a current search of DataONE.</p>
<p>Matt Jones 5:17 PM<br>
Yeah. Because that’s the ‘authoritative’ copy at cornell.</p>
<p>5:17 PM<br>
but Cornell’s node has been going up and down.</p>
<p>5:17 PM<br>
our resolve service lists all copies of a data set</p>
<p>5:17 PM<br>
so if one is down, you can get it from another location:</p>
<p>5:18 PM<br>
<code><br>
$ curl -H "Accept: text/xml" https://cn.dataone.org/cn/v2/resolve/EOD_CLO_2016.eml<br>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><br>
<ns2:objectLocationList xmlns:ns2="http://ns.dataone.org/service/types/v1"><br>
<identifier>EOD_CLO_2016.eml</identifier><br>
<objectLocation><br>
<nodeIdentifier>urn:node:CLOEBIRD</nodeIdentifier><br>
<baseURL>http://dataone.ornith.cornell.edu/metacat/d1/mn</baseURL><br>
<version>v1</version><br>
<version>v2</version><br>
<url>http://dataone.ornith.cornell.edu/metacat/d1/mn/v2/object/EOD_CLO_2016.eml</url><br>
</objectLocation><br>
<objectLocation><br>
<nodeIdentifier>urn:node:CN</nodeIdentifier><br>
<baseURL>https://cn.dataone.org/cn</baseURL><br>
<version>v1</version><br>
<version>v2</version><br>
<url>https://cn.dataone.org/cn/v2/object/EOD_CLO_2016.eml</url><br>
</objectLocation><br>
<objectLocation><br>
<nodeIdentifier>urn:node:KNB</nodeIdentifier><br>
<baseURL>https://knb.ecoinformatics.org/knb/d1/mn</baseURL><br>
<version>v1</version><br>
<version>v2</version><br>
<url>https://knb.ecoinformatics.org/knb/d1/mn/v2/object/EOD_CLO_2016.eml</url><br>
</objectLocation><br>
</ns2:objectLocationList><br>
</code></p>
<p>Ethan White 5:19 PM<br>
OK, thanks. That's why I thought the link in DataONE <a href="https://cn.dataone.org/cn/v2/resolve/EOD_CLO_2016.csv.gz">https://cn.dataone.org/cn/v2/resolve/EOD_CLO_2016.csv.gz</a> would take me to a working version, but clearly I just don't understand the details. We'll just use the the one on KNB at least for the moment. Really appreciate your help as always.</p>
<p>Matt Jones 5:20 PM<br>
No problem. I’d love to make this all work more seamlessly. (edited) </p>
<p>5:20 PM<br>
So suggestions definitely welcome.</p>
<p>5:21 PM<br>
I expect Cornell to take their node offline altogether — so the KNB will likely be the better location.</p>
<p>5:22 PM<br>
Btw, the resolve link when executed in a browser just redirects to the first copy</p>
<p>Ethan White 5:23 PM<br>
Yeah, Cornell's closed approach to things is a pretty big disappointment, especially on data like this that is generated by volunteers. We'll just go to the KNB version permanently.</p>
<p>Matt Jones 5:23 PM<br>
whereas programatically you get the list of locations</p>
<p>5:23 PM<br>
if you ask for XML</p>
<p>Ethan White 5:23 PM<br>
That makes sense. Thanks.</p>
<p>Matt Jones 5:23 PM<br>
and then you can choose to try one or more</p>
</blockquote>
Infrastructure - Story #8848 (New): A minor difference of annotation index between CN and MNhttps://redmine.dataone.org/issues/88482019-11-01T21:37:01ZJing Taotao@nceas.ucsb.edu
<p>The solr index on CN is:</p>
<pre><arr name="sem_annotation">
<str>http://purl.dataone.org/odo/ECSO_00000512</str>
<str>
http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#MeasurementType
</str>
<str>http://purl.dataone.org/odo/ECSO_00001102</str>
<str>http://purl.dataone.org/odo/ECSO_00001243</str>
<str>http://purl.dataone.org/odo/ECSO_00000629</str>
<str>http://purl.dataone.org/odo/ECSO_00000518</str>
<str>http://www.w3.org/2000/01/rdf-schema#Resource</str>
<str>http://purl.dataone.org/odo/ECSO_00000516</str>
<str>http://purl.obolibrary.org/obo/UO_0000301</str>
</arr>
</pre>
<p>The mn is:</p>
<pre><arr name="sem_annotation">
<str>http://purl.dataone.org/odo/ECSO_00000512</str>
<str>
http://ecoinformatics.org/oboe/oboe.1.2/oboe-core.owl#MeasurementType
</str>
<str>http://purl.dataone.org/odo/ECSO_00001102</str>
<str>http://purl.dataone.org/odo/ECSO_00001243</str>
<str>http://purl.dataone.org/odo/ECSO_00000629</str>
<str>http://purl.dataone.org/odo/ECSO_00000518</str>
<str>http://purl.dataone.org/odo/ECSO_00000516</str>
<str>http://purl.obolibrary.org/obo/UO_0000301</str>
</arr>
</pre>
<p>The cn has an extra <code><str>http://www.w3.org/2000/01/rdf-schema#Resource</str></code><br>
Bryce and I discussed it and thought it wouldn't affect the feature. But we still need to figure it out.</p>
Infrastructure - Story #8842 (New): Some exceptions in Metacathttps://redmine.dataone.org/issues/88422019-09-19T17:53:22ZJing Taotao@nceas.ucsb.edu
<p>In sandbox, we see some exceptions like. It appears not to hurt function, but we need to take a look at it.<br>
<code><br>
9-Sep-2019 15:19:05.303 INFO [localhost-startStop-1] org.apache.catalina.core.ApplicationContext.log Marking servlet [AxisServlet] as unavailable<br>
19-Sep-2019 15:19:05.304 SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.loadOnStartup Servlet [AxisServlet] in web application [/metacat] threw load() exception<br>
java.lang.ClassNotFoundException: org.apache.axis.transport.http.AxisServlet<br>
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1364)<br>
at org.apache.catalina.loader.WebappClassLoaderBase.loadClass(WebappClassLoaderBase.java:1185)<br>
at org.apache.catalina.core.DefaultInstanceManager.loadClass(DefaultInstanceManager.java:546)<br>
at org.apache.catalina.core.DefaultInstanceManager.loadClassMaybePrivileged(DefaultInstanceManager.java:527)<br>
at org.apache.catalina.core.DefaultInstanceManager.newInstance(DefaultInstanceManager.java:150)<br>
at org.apache.catalina.core.StandardWrapper.loadServlet(StandardWrapper.java:1044)<br>
at org.apache.catalina.core.StandardWrapper.load(StandardWrapper.java:983)<br>
at org.apache.catalina.core.StandardContext.loadOnStartup(StandardContext.java:4956)<br>
at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5270)<br>
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)<br>
at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:754)<br>
at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:730)<br>
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:734)<br>
at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:624)<br>
at org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1834)<br>
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)<br>
at java.util.concurrent.FutureTask.run(FutureTask.java:266)<br>
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)<br>
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)<br>
at java.lang.Thread.run(Thread.java:748)<br>
</code></p>
Member Nodes - Story #8819 (New): IEDA documents not in DataONEhttps://redmine.dataone.org/issues/88192019-06-13T19:20:29ZJohn Evans
<p>While testing schema_org adapters for ARM and IEDA, I noticed that 9 documents found in the IEDA site map have malformed JSON-LD script elements. These nine documents do not appear in DataOne when using search. There are two issues arising in the malformed JSON-LD:</p>
<ol>
<li>description values have over-escaped double-quotes, i.e. \"bipolar seesaw\" rather than \"bipolar seesaw\". See IEDA documents with IDs 601015, 601098, e.g. <a href="http://get.iedadata.org/metadata/iso/601015">http://get.iedadata.org/metadata/iso/601015</a></li>
<li>description values have embedded double-quotes that are not properly escaped at all. See IEDA documents with IDs 600165, 601033, 601076, 601077, 601089, 601103, 601178</li>
</ol>
<p>Fixing the over-escaped double-quotes is easy to do on the fly, but the under-escaped double-quotes requires taking a bit more care (still doable on the fly).</p>
Infrastructure - Story #8806 (New): Cleanup from OS upgradeshttps://redmine.dataone.org/issues/88062019-05-21T12:45:17ZDave Vieglaisdave.vieglais@gmail.com
<p>There's a few items that need to be addressed after the OS upgrades from 14.04 to 18.04.</p>
Infrastructure - Story #8762 (New): Add new formats to CNhttps://redmine.dataone.org/issues/87622019-02-07T19:58:28ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>Add a new formatId for Python notebooks: application/x-ipynb+json. This is the current popular Media Type for Python notebooks (<a href="https://jupyter.readthedocs.io/en/latest/reference/mimetype.html">https://jupyter.readthedocs.io/en/latest/reference/mimetype.html</a>), but as it has the 'x-' prefix it is 'unregistered' and<br>
not recognized by IANA (<a href="https://www.iana.org/assignments/media-types/media-types.xhtml">https://www.iana.org/assignments/media-types/media-types.xhtml</a>).</p>
<p>The new entry should be:<br>
<objectFormat><br>
<formatId>application/x-ipynb+json</formatId><br>
<formatName>Jupyter Notebook</formatName><br>
<formatType>DATA</formatType><br>
<mediaType name="application/x-ipynb+json"/><br>
<extension>ipynb</extension><br>
</objectFormat></p>
Infrastructure - Story #8736 (New): Decouple the index generator to the hazelcast system metadata...https://redmine.dataone.org/issues/87362018-10-23T20:01:34ZJing Taotao@nceas.ucsb.edu
<p>Currently, the index generator is a listener of the system metadata map. During the tomcat starting, the map will load data from the db table and will generate a lot of unneeded index tasks. Currently we try to use filters to filter out them but it is very expensive. </p>
Member Nodes - Story #8727 (In Progress): NCAR Discovery & Assessmenthttps://redmine.dataone.org/issues/87272018-10-04T16:48:23ZAmy Forresteraforres4@utk.edu
<p>Establish contact and build relationship with a potential new member node. Determine if DataONE and the repository are a good fit for one another and if the repository generally meets the requirements of DataONE member nodes. </p>
<p>This story is complete when a determination is made to either proceed with a new deployment, or that joining DataONE is not an option for the repository at this time.</p>
Member Nodes - Story #8589 (New): ARM: Re-Discovery & Planninghttps://redmine.dataone.org/issues/85892018-05-10T20:35:03ZAmy Forresteraforres4@utk.edu
<p><strong>5/10/2018:</strong> Meeting at ORNL - Giri Prakash expressed re-interest in becoming a MN. Following on USGS_SDC GMN implementation (~Fall 2018), Aaron Stokes can turn attention to ARM installation</p>
Member Nodes - Story #8568 (New): (CyVerse) Move to productionhttps://redmine.dataone.org/issues/85682018-04-19T18:15:01ZAmy Forresteraforres4@utk.eduMember Nodes - Story #8521 (New): CAFF: Testing & Developmenthttps://redmine.dataone.org/issues/85212018-03-26T15:07:58ZAmy Forresteraforres4@utk.edu
<p>The repository and DataONE have agreed to proceed with deployment as a member node. Install or develop a functional member node to be registered to a non-production environment.</p>
Testing MN Management - Story #8515 (New): meow: Move to Productionhttps://redmine.dataone.org/issues/85152018-03-22T20:45:06ZAmy Forresteraforres4@utk.eduTesting MN Management - Story #8463 (New): test: Testing & Developmenthttps://redmine.dataone.org/issues/84632018-03-01T21:04:41ZAmy Forresteraforres4@utk.edu
<p>Install or develop a functional member node to be registered to a non-production environment. </p>
Testing MN Management - Story #8381 (New): Discoveryhttps://redmine.dataone.org/issues/83812018-02-27T17:32:03ZMonica Ihliemail@monicaihli.com
<p>Discovery is about establishing contact and building a relationship with a potential new member node. In this phase, it is determined if DataONE and the repository are a good fit for one another and if the repository generally meets the requirements of DataONE member nodes. Broad discussions of deployment options may be reviewed as well.<br>
This story is complete when a determination is made to either proceed with planning a new deployment, or that joining DataONE is not an option for the repository at this time.</p>