DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2018-10-02T16:51:19ZDataONE Tasks
Redmine Member Nodes - Task #8723 (New): tDAR: Implemented IP Whitelisting on tDAR systems / Impacts to r...https://redmine.dataone.org/issues/87232018-10-02T16:51:19ZMonica Ihliemail@monicaihli.com
<p>tdar has implemented IP whitelisting on their member node. Requests from IP addresses not whitelisted will be refused. Was provided CN IPs for production. The reports that ping Tdar MN for up status will fail unless it happens to be executed from an IP that can be provided. This may or may not be considered a big enough issue to do something about. No harm is actually being done as long as the CNs can get in and do their business. </p>
Infrastructure - Decision #8693 (In Progress): Support Google Dataset Search on search.dataone.or...https://redmine.dataone.org/issues/86932018-09-07T00:16:59ZBryce Mecummecum@nceas.ucsb.edu
<a name="Background"></a>
<h2 >Background<a href="#Background" class="wiki-anchor">¶</a></h2>
<p>Yesterday, <a href="https://toolbox.google.com/datasetsearch" class="external">Google Dataset Search</a> launched. We previoiusly attempted to make MetacatUI (and by extension, DataONE Search) compatible with it by <a href="https://github.com/NCEAS/metacatui/issues/482" class="external">injecting Schema.org JSON-LD into appropriate pages</a>. During development and testing, we checked our compatibility with the upcoming Google Dataset Search using Google's <a href="https://search.google.com/structured-data/testing-tool" class="external">Structured Data Testing Tool</a>. During development, this was all working fine and the feature appeared to be compatible but, after launching the feature on search.dataone.org, behavior changed on Google's end making it so Google no longer saw this JSON-LD. The reason for this is likely that, because MetacatUI follows a single page application architecture and we inject the JSON-LD on the client side, Google's JSON-LD crawler only saw what was sent from the server (a nearly empty index.html) and not our full page (with JSON-LD). I was able to test this theory and, while Google's crawler does execute JavaScript, it limits execution to about or exactly five seconds and MetacatUI <em>usually</em> doesn't finish injecting JSON-LD and rendering all content until after that timeout.</p>
<p>Potential paths forward to get DataONE Search compatible with Google's Dataset Search include (none of which are mutually exclusive):</p>
<ol>
<li>The assets that make up MetacatUI and the asset loading strategies could be optimized: <a href="https://github.com/NCEAS/metacatui/issues/224">https://github.com/NCEAS/metacatui/issues/224</a></li>
<li>Move the code (and any dependencies) that injects JSON-LD further up in the app boot so that Google sees it</li>
<li>Inject the appropriate JSON-LD on the server side to guarantee that Google sees it (originally Matt Jones' idea!)</li>
</ol>
<p>(1) is being worked on for sure, and (2) may not be needed if (1) is successful. I want to talk about option (3) because:</p>
<ul>
<li>It's a quicker solution (I already have something working) which would help get us involved in the project faster</li>
<li>It paves the way for future features and/or improvements to MetacatUI (we could be rendering more on the server side than just JSON-LD, like other metadata, more page content, etc)</li>
</ul>
<a name="What-I-did"></a>
<h2 >What I did<a href="#What-I-did" class="wiki-anchor">¶</a></h2>
<p>To test this idea, I modified a <a href="https://github.com/amoeba/backbone-pushstate-example" class="external">previous project</a> which is just a simple Node (Express.js) app that hosts MetacatUI by intercepting every request and serving the appropriate asset. In injects Schema.org JSON-LD, when appropriate, by querying the CN Solr index before sending MetacatUI's index.html to the client. <a href="https://github.com/amoeba/metacatui-ssr" class="external">Code is here</a> and its deployed <a href="http://neutral-cat.nceas.ucsb.edu/" class="external">here</a>. View source on any /view/... pages and you'll see a minimal Schema.org/Dataset description in the head. More properties can be added later. I did it quick and dirty: The app pre-loads MetacatUI's index.html as a <code>String</code> at app boot and injects the JSON-LD into it. No templating language or other magic.</p>
<a name="Things-to-address"></a>
<h2 >Things to address<a href="#Things-to-address" class="wiki-anchor">¶</a></h2>
<ul>
<li>How do we feel abouts switching from hosting MetacatUI via Apache (simple, bullet proof) to a Node based deployment just to support this feature (new territory, at least for me)?</li>
<li>If we do switch, we'd want to make really sure the Node app doesn't have weird failure cases where it doesn't return index.html (e.g., when Solr is down, or slow). The app needs to return index.html (and every other static asset) on every request and do it very fast and we should decide what the cutoff is so that it doesn't hold up app boot if Solr is slow/down.</li>
<li>Can this type of deployment easily be integrated with CN buildouts? I've deployed Node apps before by fronting them with Apache/nginx (via reverse proxy) and then keeping the node process up with Upstart</li>
<li>Is this performant enough for DataONE? I think my implementation is non-blocking but I'm not a Node expert so we'd want to code review and probably benchmark </li>
<li>We could wait on (1) and stick with our current deployment strategy</li>
</ul>
<a name="Other-notes"></a>
<h2 >Other notes<a href="#Other-notes" class="wiki-anchor">¶</a></h2>
<p>Unrelated to the Google Dataset Search issue but related to Google's crawling for Google Search, we've also identified:</p>
<ul>
<li>That the Metacat View Service is often unreasonably slow: <a href="https://github.com/NCEAS/metacat/issues/1234">https://github.com/NCEAS/metacat/issues/1234</a> and are planning to figure out why</li>
<li>That we can and should make use of sitemaps to help Google crawl our pages: <a href="https://github.com/NCEAS/metacat/issues/1263">https://github.com/NCEAS/metacat/issues/1263</a></li>
</ul>
Python Libraries - Task #7821 (New): Verify Expect: 100-Continue header on POST or PUT requestshttps://redmine.dataone.org/issues/78212016-05-31T20:55:02ZRobert Waltz
<p>We may be adding support for the 100-Continue header in order to successfully upload content on some member node servers.</p>
<p>More description of the header's use from a Java client:</p>
<pre> 'Expect: 100-Continue' handshake for the entity enclosing methods.
The purpose of the Expect: 100-Continue handshake is to allow the client
that is sending a request message with a request body to determine if
the origin server is willing to accept the request (based on the request
headers) before the client sends the request body. The use of the
Expect: 100-continue handshake can result in a noticeable performance
improvement for entity enclosing requests (such as POST and PUT) that
require the target server's authentication.
https://hc.apache.org/httpcomponents-client-4.2.x/tutorial/html/fundamentals.html
</pre> Member Nodes - MNDeployment #7163 (New): UN Environmental Programme World Conservation Monitoring...https://redmine.dataone.org/issues/71632015-06-07T17:14:16ZLaura Moyerslmoyers1@utk.edu
<p>Key holding: World database of protected lands<br>
Dependencies: They are in the process of implementing CSW. So, we need the slender node and we need them to get their CSW up.<br>
Location: Cambridge, England</p>
Python GMN - Story #7095 (New): Support HTTP redirect capability for GMN within the Vendor Specif...https://redmine.dataone.org/issues/70952015-05-12T16:20:39ZMark Servillamark.servilla@gmail.com
<p>GMN provides proxy capabilities to integrate existing content residing in a back-end data repository through its Vendor Specific Extension model. Only minimal support exists within the GMN-VSE model for HTTP redirects. Although the use of HTTP redirects is more of an exception within the the GMN-VSE model, it can provide a flexible mechanism for MNs who dynamically manage data and metadata objects.</p>
Member Nodes - MNDeployment #7048 (Deferred): USGS Regional Climate Centershttps://redmine.dataone.org/issues/70482015-04-17T17:24:19ZBruce Wilsonbwilso27@utk.eduMember Nodes - MNDeployment #6359 (New): SiBBrhttps://redmine.dataone.org/issues/63592014-09-05T19:38:43ZBen Leinfelderleinfelder@nceas.ucsb.edu
<p>At a GBIF-sponsored workshop/meeting in Petropolis, I did a few seminars on Metacat and Morpho. There's potential interest in a MN if/when they set up a Metacat repository for their ecological data research as part of the larger biodiversity/LTER-style network they are building.<br>
<a href="http://www.sibbr.gov.br/">http://www.sibbr.gov.br/</a><br>
Primary contact: Luiz Gadelha <a href="mailto:lgadelha@lncc.br">lgadelha@lncc.br</a><br>
Debora Drucker (who we know from PPBIO/PELD) is also aware of this initiative and supports the idea of a centralized repository hosted at LNCC (national scientific computing lab).</p>
Member Nodes - MNDeployment #4227 (New): TRY Plant Trait Databasehttps://redmine.dataone.org/issues/42272014-01-17T11:48:28ZLaura Moyerslmoyers1@utk.edu
<p>Jens Kattge directs the international Plant Traits Database and is interested in joining DataONE as a Member Node. (Initial contact with Bill.)</p>
<p>Contact info:<br>
Jens Kattge <a href="mailto:jkattge@bgc-jena.mpg.de">jkattge@bgc-jena.mpg.de</a> <br>
website: <a href="http://www.try-db.org" class="external">www.try-db.org</a></p>
Infrastructure - Task #3763 (New): Visualize PyPI download statistics for Python componentshttps://redmine.dataone.org/issues/37632013-05-14T21:11:04ZRoger Dahldahl@unm.edu
<p>Create a tool that gets download statistics for DataONE software distributed on PyPI and generates graphs that show how number of downloads change over time.</p>
Infrastructure - Task #2147 (New): The Python stack does not support Unicode supplementary charac...https://redmine.dataone.org/issues/21472011-12-20T15:44:22ZRoger Dahldahl@unm.edu
<p>When given this identifier:</p>
<p>common-unicode-supplementary-escaped-</p>
Requirements - Requirement #581 (New): (Requirement) System supports multiple types of science me...https://redmine.dataone.org/issues/5812010-04-20T20:28:56ZDave Vieglaisdave.vieglais@gmail.com
<p>The <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> cyberinfrastructure should be able to support the range of science metadata types in common use.</p>
<p>Rationale</p>
<p>Science metadata describes the data in ways that support discovery, understanding, and re-use of the data. There are multiple types of science metadata in common use, and so the system should support a range of metadata formats commensurate with the needs of the community.</p>
<p>Fit Criteria</p>
<p>**** Multiple formats of metadata can be stored in the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> system.</p>
<p>**** Multiple metadata formats are supported in the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> operations (e.g. search) </p>
<p>**** New metadata formats can be supported by the system</p>
Requirements - Requirement #410 (New): (Requirement) The infrastructure must support long term pr...https://redmine.dataone.org/issues/4102010-03-24T15:23:06ZDave Vieglaisdave.vieglais@gmail.com
<p>The infrastructure developed by <a href="DataNets in general" class="external">[DataONE]</a> must promote the preservation of data. This is a requirement of the RFP</p>
<p>There are many aspects to preservation. These are addressed by the requirements on which this entry depends (i.e. Blocked By)</p>
<p>Rationale</p>
<p>Preservation of data encourages re-use of existing content.</p>
Requirements - Requirement #385 (New): (Requirement) Support arbitrary unique identifiershttps://redmine.dataone.org/issues/3852010-03-16T22:09:36ZDave Vieglaisdave.vieglais@gmail.com
<p><a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> cyber-infrastructure will need to support a wide ranger of identifiers including, but not limited to: unique strings, LSIDs, Handles, DOIs, URIs.</p>
<p>Rationale</p>
<p><a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> will be integrated with many types of data provider which utilize a variety of identifier schemes. It is not feasible to expect all data providers to be able to handle arbitrary object identifiers when replicating content between member nodes. Hence, the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> core cyber-infrastructure should be identifier agnostic.</p>
<p>Fit Criteria</p>
<ul>
<li>Any type of unique identifier can be supported by Coordinating Nodes</li>
<li>Some Member Nodes can be adapted to support any type of identifier</li>
</ul>
Requirements - Requirement #383 (New): (Requirement) System supports data storagehttps://redmine.dataone.org/issues/3832010-03-16T21:35:55ZDave Vieglaisdave.vieglais@gmail.com
<p>The <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> cyberinfrastructure must provide a mechanism supporting the storage of data and metadata (seems kind of obvious, but it is a requirement).</p>
<p>Rationale</p>
<p>This is a fundamental requirement of the infrastructure addressing the RFP.</p>
<p>Fit Criteria</p>
<ul>
<li><p>data and metadata can be stored into the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> cyberinfrastructure and be retrieved at a later time</p></li>
<li><p>Existing data with associated metadata can be added to the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> infrastructure</p></li>
</ul>
Requirements - Requirement #339 (New): (Requirement) Web presence to provide publicly accessible ...https://redmine.dataone.org/issues/3392010-03-12T15:25:30ZDave Vieglaisdave.vieglais@gmail.com
<p><a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> is a high profile project that has the potential to influence a very large audience. As such, it is beneficial to provide a resource where the public (i.e. non-authenticated users) can discover information about the project, see updated news items and generally keep informed about project progress and activities.</p>
<p>Rationale</p>
<p><a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> provides infrastructure for the community so it is useful to help the community keep informed about the cool stuff we're doing.</p>
<p>Fit Criteria</p>
<ul>
<li>events and news are advertised on the web site</li>
<li>mechanisms are provided for subscribing to notifications (feeds, email)</li>
<li>some measurement of community awareness increases over the life of the project. </li>
</ul>