DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-06-19T01:44:19ZDataONE Tasks
Redmine Infrastructure - Bug #8822 (New): Account queries in STAGE-2 failing from web browserhttps://redmine.dataone.org/issues/88222019-06-19T01:44:19ZBryce Mecummecum@nceas.ucsb.edu
<p>Steps to reproduce:</p>
<ol>
<li>Visit <a href="https://search-stage-2.test.dataone.org/">https://search-stage-2.test.dataone.org/</a></li>
<li>Log in</li>
<li>Navigate to <a href="https://search-stage-2.test.dataone.org/data">https://search-stage-2.test.dataone.org/data</a></li>
<li>Observe an HTTP 500 in the Network pane of whichever browser you're using to <a href="https://search-stage-2.test.dataone.org/cn/v2/accounts/?query=%7BYOUR_DN%7D">https://search-stage-2.test.dataone.org/cn/v2/accounts/?query={YOUR_DN}</a></li>
</ol>
<p>Response body:</p>
<pre><?xml version="1.0" encoding="UTF-8"?>
<error detailCode="500" errorCode="500" name="ServiceFailure">
<description>Internal Server Error: The server encountered an unexpected condition which prevented it from fulfilling the request.</description>
</error>
</pre>
<p>Some notes:</p>
<ul>
<li>I notice this in multiple browsers</li>
<li>I notice this only when the request is issued from MetacatUI, not when I visit the URL in my browser or hit it with curl</li>
<li>I don't notice this error on search.dataone.org</li>
<li>MetacatUI on search.dataone.org doesn't even issue this request</li>
<li>MetacatUI on stage-2 is at v2.4.2 and on search is at 2.6.1</li>
</ul>
<p>It seems like a bug to me that we see a service failure but I'm not sure if this is a MetacatUI bug or an issue in the CN stack (i.e., Apache config or something) but I wanted to file it for someone to take a look.</p>
Member Nodes - Task #8723 (New): tDAR: Implemented IP Whitelisting on tDAR systems / Impacts to r...https://redmine.dataone.org/issues/87232018-10-02T16:51:19ZMonica Ihliemail@monicaihli.com
<p>tdar has implemented IP whitelisting on their member node. Requests from IP addresses not whitelisted will be refused. Was provided CN IPs for production. The reports that ping Tdar MN for up status will fail unless it happens to be executed from an IP that can be provided. This may or may not be considered a big enough issue to do something about. No harm is actually being done as long as the CNs can get in and do their business. </p>
Infrastructure - Decision #8693 (In Progress): Support Google Dataset Search on search.dataone.or...https://redmine.dataone.org/issues/86932018-09-07T00:16:59ZBryce Mecummecum@nceas.ucsb.edu
<a name="Background"></a>
<h2 >Background<a href="#Background" class="wiki-anchor">¶</a></h2>
<p>Yesterday, <a href="https://toolbox.google.com/datasetsearch" class="external">Google Dataset Search</a> launched. We previoiusly attempted to make MetacatUI (and by extension, DataONE Search) compatible with it by <a href="https://github.com/NCEAS/metacatui/issues/482" class="external">injecting Schema.org JSON-LD into appropriate pages</a>. During development and testing, we checked our compatibility with the upcoming Google Dataset Search using Google's <a href="https://search.google.com/structured-data/testing-tool" class="external">Structured Data Testing Tool</a>. During development, this was all working fine and the feature appeared to be compatible but, after launching the feature on search.dataone.org, behavior changed on Google's end making it so Google no longer saw this JSON-LD. The reason for this is likely that, because MetacatUI follows a single page application architecture and we inject the JSON-LD on the client side, Google's JSON-LD crawler only saw what was sent from the server (a nearly empty index.html) and not our full page (with JSON-LD). I was able to test this theory and, while Google's crawler does execute JavaScript, it limits execution to about or exactly five seconds and MetacatUI <em>usually</em> doesn't finish injecting JSON-LD and rendering all content until after that timeout.</p>
<p>Potential paths forward to get DataONE Search compatible with Google's Dataset Search include (none of which are mutually exclusive):</p>
<ol>
<li>The assets that make up MetacatUI and the asset loading strategies could be optimized: <a href="https://github.com/NCEAS/metacatui/issues/224">https://github.com/NCEAS/metacatui/issues/224</a></li>
<li>Move the code (and any dependencies) that injects JSON-LD further up in the app boot so that Google sees it</li>
<li>Inject the appropriate JSON-LD on the server side to guarantee that Google sees it (originally Matt Jones' idea!)</li>
</ol>
<p>(1) is being worked on for sure, and (2) may not be needed if (1) is successful. I want to talk about option (3) because:</p>
<ul>
<li>It's a quicker solution (I already have something working) which would help get us involved in the project faster</li>
<li>It paves the way for future features and/or improvements to MetacatUI (we could be rendering more on the server side than just JSON-LD, like other metadata, more page content, etc)</li>
</ul>
<a name="What-I-did"></a>
<h2 >What I did<a href="#What-I-did" class="wiki-anchor">¶</a></h2>
<p>To test this idea, I modified a <a href="https://github.com/amoeba/backbone-pushstate-example" class="external">previous project</a> which is just a simple Node (Express.js) app that hosts MetacatUI by intercepting every request and serving the appropriate asset. In injects Schema.org JSON-LD, when appropriate, by querying the CN Solr index before sending MetacatUI's index.html to the client. <a href="https://github.com/amoeba/metacatui-ssr" class="external">Code is here</a> and its deployed <a href="http://neutral-cat.nceas.ucsb.edu/" class="external">here</a>. View source on any /view/... pages and you'll see a minimal Schema.org/Dataset description in the head. More properties can be added later. I did it quick and dirty: The app pre-loads MetacatUI's index.html as a <code>String</code> at app boot and injects the JSON-LD into it. No templating language or other magic.</p>
<a name="Things-to-address"></a>
<h2 >Things to address<a href="#Things-to-address" class="wiki-anchor">¶</a></h2>
<ul>
<li>How do we feel abouts switching from hosting MetacatUI via Apache (simple, bullet proof) to a Node based deployment just to support this feature (new territory, at least for me)?</li>
<li>If we do switch, we'd want to make really sure the Node app doesn't have weird failure cases where it doesn't return index.html (e.g., when Solr is down, or slow). The app needs to return index.html (and every other static asset) on every request and do it very fast and we should decide what the cutoff is so that it doesn't hold up app boot if Solr is slow/down.</li>
<li>Can this type of deployment easily be integrated with CN buildouts? I've deployed Node apps before by fronting them with Apache/nginx (via reverse proxy) and then keeping the node process up with Upstart</li>
<li>Is this performant enough for DataONE? I think my implementation is non-blocking but I'm not a Node expert so we'd want to code review and probably benchmark </li>
<li>We could wait on (1) and stick with our current deployment strategy</li>
</ul>
<a name="Other-notes"></a>
<h2 >Other notes<a href="#Other-notes" class="wiki-anchor">¶</a></h2>
<p>Unrelated to the Google Dataset Search issue but related to Google's crawling for Google Search, we've also identified:</p>
<ul>
<li>That the Metacat View Service is often unreasonably slow: <a href="https://github.com/NCEAS/metacat/issues/1234">https://github.com/NCEAS/metacat/issues/1234</a> and are planning to figure out why</li>
<li>That we can and should make use of sitemaps to help Google crawl our pages: <a href="https://github.com/NCEAS/metacat/issues/1263">https://github.com/NCEAS/metacat/issues/1263</a></li>
</ul>
Member Nodes - MNDeployment #8009 (New): Geometabolomehttps://redmine.dataone.org/issues/80092017-01-31T21:41:45ZLaura Moyerslmoyers1@utk.edu
<p>Email from Aron Stubbins (<a href="mailto:Aron.Stubbins@skio.uga.edu">Aron.Stubbins@skio.uga.edu</a>) 6 December 2016:</p>
<blockquote>
<p>Myself, Rob Fatland (ccd) and an international team of researchers have been developing a data system for dissolved organic matter data that provides novel, standardized ways to process the data allowing better comparison of data across studies, state-of-the-science processing to groups outside of the core groups developing these techniques, plus other more common data system attributes (archiving, indexing, DOIs etc.).</p>
<p>We have been developing on a shoestring budget and to make the system sustainable and fully functional for the community, we need some additional support. Could we discuss how we might tie into DataONE both functionally (potential connections between data bases) and where we might get funding through DataONE or related programs.</p>
</blockquote>
<p>Rob Fatland (<a href="mailto:rob5@uw.edu">rob5@uw.edu</a>) said in subsequent email 14 December 2016:</p>
<blockquote>
<p>I think that the technical steps will not be terribly daunting when we get ready to coordinate with DataONE and it follows the API-driven federation model which we favor, so that's great. </p>
</blockquote>
<p>NEXT STEPS: Laura to set up exploratory meeting with Geometabolome.</p>
Member Nodes - MNDeployment #7973 (New): Alaska Energy Data Gatewayhttps://redmine.dataone.org/issues/79732017-01-27T15:19:28ZLaura Moyerslmoyers1@utk.edu
<p>At a meeting 26 January 2017, it was mentioned that GINA is working with the Alaska Energy Data Gateway on a proposal where there is a component for implementing a DataONE Member Node for AEDG. This is still very early in the proposal stage, but the possibility exists for an AEDG MN in future.</p>
Member Nodes - MNDeployment #7962 (Deferred): Alaska EPSCoRhttps://redmine.dataone.org/issues/79622017-01-09T19:00:08ZRebecca Koskelarkoskela@unm.edu
<p>GINA also runs the Alaska EPSCoR repository and because it's a separate project, it should also be a separate MN</p>
<p><a href="http://www.alaska.edu/epscor/">http://www.alaska.edu/epscor/</a><br>
Current portal that is hosted by GINA: <a href="http://epscor.alaska.edu/">http://epscor.alaska.edu/</a></p>
Member Nodes - MNDeployment #7851 (New): CESAB - Center for Synthesis and Analysis for Biodiversityhttps://redmine.dataone.org/issues/78512016-07-22T13:57:50ZLaura Moyerslmoyers1@utk.edu
<p><a href="http://www.cesab.org/index.php/fr/">http://www.cesab.org/index.php/fr/</a> (in French)</p>
<p>CESAB is the Synthesis and Analysis center for biodiversity data in France. In a meeting with ECOSCOPE and CESAB on 21 July (see epad <a href="https://epad.dataone.org/pad/p/ECOSCOPE_and_DataONE">https://epad.dataone.org/pad/p/ECOSCOPE_and_DataONE</a>), Alison Specht said that CESAB is interesting in working with DataONE, particularly as a Member Node. </p>
<p>CESAB doesn't have a data discovery portal at the moment, but they do have about 5 projects which have delivered data, some of which is exposed and some of which is metadata-only. There are ~100,000, objects, another project with ~6,000 objects coming soon. Some of these are private (metadata is public but data is private). There is some EML metadata, but it has been difficult to create appropriate EML metadata for some contributors.</p>
<p>First steps are to continue discussions and gather information about CESAB's current and future content and identify the best MN solution for them.</p>
Member Nodes - MNDeployment #7758 (New): Polar Rock Repository (PRR) at Ohio Statehttps://redmine.dataone.org/issues/77582016-05-02T23:50:40ZLaura Moyerslmoyers1@utk.edu
<p>The Polar Rock Repository (PRR) at the Ohio State University holds over 40,000 physical specimens which are available to be "checked out" by users. Laura/Mark met with Anne Grunow and Wes Haines 4/28/16 to discuss DataONE, the PRR, and how we might work together. See <a href="https://epad.dataone.org/pad/p/PRR_and_DataONE">https://epad.dataone.org/pad/p/PRR_and_DataONE</a></p>
DataONE API - Bug #7684 (New): Call to MNStorage.update() via REST API returns java.lang.StackOve...https://redmine.dataone.org/issues/76842016-03-21T23:07:39ZBryce Mecummecum@nceas.ucsb.edu
<p>I was trying to update an object via the REST API via cURL and forgot to enter the correct URL. The cURL command I used and response is:</p>
<p>$ curl -X PUT -H "Authorization: Bearer $TOKEN" -F "pid=resourceMap_doi:10.5065/D6G44NFV" -F "object=@object.xml" -F "sysmeta=@sysmeta.xml" -F "newPid=resourceMap_doi:10.5065/D6G44NFV_v3" $URL<br>
<?xml version="1.0" encoding="UTF-8"?><br>
java.lang.StackOverflowError<br>
</p>
<p>Where $URL was '<a href="https://arcticdata.io/metacat/d1/mn/v2/object">https://arcticdata.io/metacat/d1/mn/v2/object</a>' instead of '<a href="https://arcticdata.io/metacat/d1/mn/v2/object/resourceMap_doi:10.5065/D6G44NFV">https://arcticdata.io/metacat/d1/mn/v2/object/resourceMap_doi:10.5065/D6G44NFV</a>'</p>
<p>I expected to receive some sort of warning/error that I had forgotten to specify the URL properly for this call but instead saw a StackOverflowError.</p>
Member Nodes - MNDeployment #7549 (New): COS - the Center for Open Sciencehttps://redmine.dataone.org/issues/75492015-12-14T21:10:21ZLaura Moyerslmoyers1@utk.edu
<p>This is to document activity related to the possible collaboration with COS as a Member Node. See website <a href="https://cos.io/">https://cos.io/</a></p>
Member Nodes - MNDeployment #7163 (New): UN Environmental Programme World Conservation Monitoring...https://redmine.dataone.org/issues/71632015-06-07T17:14:16ZLaura Moyerslmoyers1@utk.edu
<p>Key holding: World database of protected lands<br>
Dependencies: They are in the process of implementing CSW. So, we need the slender node and we need them to get their CSW up.<br>
Location: Cambridge, England</p>
Python GMN - Task #7162 (New): Member node does not check with coordinating node before replicatinghttps://redmine.dataone.org/issues/71622015-06-05T23:49:44ZMark Flynnflynnm@dataone.unm.edu
<p>Replication is supposed to follow a sequence:<br>
CN — MN.replicate( ) —> newReplicaMN<br>
newReplicatMN —- MN.getReplica( ) —> sourceMN<br>
sourceMN — CN.isNodeAuthorized() —> CN<br>
newReplicaMN — CN.setReplicationStatus( ) —> CN</p>
<p>However, CN.isNodeAuthorized is not called before sourceMN replicates to newReplicaMN</p>
Member Nodes - MNDeployment #7145 (New): PlutoFhttps://redmine.dataone.org/issues/71452015-06-03T12:52:55ZLaura Moyerslmoyers1@utk.edu
<p>PlutoF is hosted at the University of Tartu in Estonia and "provides cloud database and computing services for the taxonomical, ecological, phylogenetical, etc. research. The purpose of the platform is to provide synergy through common modules for the classifications, taxon names, analytical tools, etc. It allows to address integrated questions in ecology and coevolution of taxa. Different types of the species occurrences, viz. preserved specimens, DNA sequences, human observations, references can be stored in PlutoF as well. PlutoF has no restrictions on taxon and geographic coverage and therefore can be used for the databasing interacting taxa. It includes also collection management module. Few examples of the public web outputs from PlutoF are Estonian eBiodiversity (<a href="http://elurikkus.ut.ee">http://elurikkus.ut.ee</a>), and molecular key for fungi (<a href="https://unite.ut.ee)">https://unite.ut.ee)</a>."</p>
<p>Anticipate GMN for this implementation. They currently hold ~500,000 biodiversity records.</p>
<p>For more information about PlutoF, see references in these articles: <a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023303/">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3023303/</a>, </p>
<p><a href="http://www.la-press.com/plutofa-web-based-workbench-for-ecological-and-taxonomic-research-with-article-a2406">http://www.la-press.com/plutofa-web-based-workbench-for-ecological-and-taxonomic-research-with-article-a2406</a></p>
Python GMN - Story #7095 (New): Support HTTP redirect capability for GMN within the Vendor Specif...https://redmine.dataone.org/issues/70952015-05-12T16:20:39ZMark Servillamark.servilla@gmail.com
<p>GMN provides proxy capabilities to integrate existing content residing in a back-end data repository through its Vendor Specific Extension model. Only minimal support exists within the GMN-VSE model for HTTP redirects. Although the use of HTTP redirects is more of an exception within the the GMN-VSE model, it can provide a flexible mechanism for MNs who dynamically manage data and metadata objects.</p>
Member Nodes - MNDeployment #7052 (New): BaMBa - Brazilian Marine Biodiversity Databasehttps://redmine.dataone.org/issues/70522015-04-17T19:58:11ZLaura Moyerslmoyers1@utk.edu
<p>Luiz Gadelha forwarded a MNDD to <a href="mailto:support@dataone.org">support@dataone.org</a>. We need to set up a getting-to-know-you meeting with interested parties.</p>
<p><a href="https://marinebiodiversity.lncc.br">https://marinebiodiversity.lncc.br</a></p>