DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2018-10-02T16:51:19ZDataONE Tasks
Redmine Member Nodes - Task #8723 (New): tDAR: Implemented IP Whitelisting on tDAR systems / Impacts to r...https://redmine.dataone.org/issues/87232018-10-02T16:51:19ZMonica Ihliemail@monicaihli.com
<p>tdar has implemented IP whitelisting on their member node. Requests from IP addresses not whitelisted will be refused. Was provided CN IPs for production. The reports that ping Tdar MN for up status will fail unless it happens to be executed from an IP that can be provided. This may or may not be considered a big enough issue to do something about. No harm is actually being done as long as the CNs can get in and do their business. </p>
Infrastructure - Decision #8693 (In Progress): Support Google Dataset Search on search.dataone.or...https://redmine.dataone.org/issues/86932018-09-07T00:16:59ZBryce Mecummecum@nceas.ucsb.edu
<a name="Background"></a>
<h2 >Background<a href="#Background" class="wiki-anchor">¶</a></h2>
<p>Yesterday, <a href="https://toolbox.google.com/datasetsearch" class="external">Google Dataset Search</a> launched. We previoiusly attempted to make MetacatUI (and by extension, DataONE Search) compatible with it by <a href="https://github.com/NCEAS/metacatui/issues/482" class="external">injecting Schema.org JSON-LD into appropriate pages</a>. During development and testing, we checked our compatibility with the upcoming Google Dataset Search using Google's <a href="https://search.google.com/structured-data/testing-tool" class="external">Structured Data Testing Tool</a>. During development, this was all working fine and the feature appeared to be compatible but, after launching the feature on search.dataone.org, behavior changed on Google's end making it so Google no longer saw this JSON-LD. The reason for this is likely that, because MetacatUI follows a single page application architecture and we inject the JSON-LD on the client side, Google's JSON-LD crawler only saw what was sent from the server (a nearly empty index.html) and not our full page (with JSON-LD). I was able to test this theory and, while Google's crawler does execute JavaScript, it limits execution to about or exactly five seconds and MetacatUI <em>usually</em> doesn't finish injecting JSON-LD and rendering all content until after that timeout.</p>
<p>Potential paths forward to get DataONE Search compatible with Google's Dataset Search include (none of which are mutually exclusive):</p>
<ol>
<li>The assets that make up MetacatUI and the asset loading strategies could be optimized: <a href="https://github.com/NCEAS/metacatui/issues/224">https://github.com/NCEAS/metacatui/issues/224</a></li>
<li>Move the code (and any dependencies) that injects JSON-LD further up in the app boot so that Google sees it</li>
<li>Inject the appropriate JSON-LD on the server side to guarantee that Google sees it (originally Matt Jones' idea!)</li>
</ol>
<p>(1) is being worked on for sure, and (2) may not be needed if (1) is successful. I want to talk about option (3) because:</p>
<ul>
<li>It's a quicker solution (I already have something working) which would help get us involved in the project faster</li>
<li>It paves the way for future features and/or improvements to MetacatUI (we could be rendering more on the server side than just JSON-LD, like other metadata, more page content, etc)</li>
</ul>
<a name="What-I-did"></a>
<h2 >What I did<a href="#What-I-did" class="wiki-anchor">¶</a></h2>
<p>To test this idea, I modified a <a href="https://github.com/amoeba/backbone-pushstate-example" class="external">previous project</a> which is just a simple Node (Express.js) app that hosts MetacatUI by intercepting every request and serving the appropriate asset. In injects Schema.org JSON-LD, when appropriate, by querying the CN Solr index before sending MetacatUI's index.html to the client. <a href="https://github.com/amoeba/metacatui-ssr" class="external">Code is here</a> and its deployed <a href="http://neutral-cat.nceas.ucsb.edu/" class="external">here</a>. View source on any /view/... pages and you'll see a minimal Schema.org/Dataset description in the head. More properties can be added later. I did it quick and dirty: The app pre-loads MetacatUI's index.html as a <code>String</code> at app boot and injects the JSON-LD into it. No templating language or other magic.</p>
<a name="Things-to-address"></a>
<h2 >Things to address<a href="#Things-to-address" class="wiki-anchor">¶</a></h2>
<ul>
<li>How do we feel abouts switching from hosting MetacatUI via Apache (simple, bullet proof) to a Node based deployment just to support this feature (new territory, at least for me)?</li>
<li>If we do switch, we'd want to make really sure the Node app doesn't have weird failure cases where it doesn't return index.html (e.g., when Solr is down, or slow). The app needs to return index.html (and every other static asset) on every request and do it very fast and we should decide what the cutoff is so that it doesn't hold up app boot if Solr is slow/down.</li>
<li>Can this type of deployment easily be integrated with CN buildouts? I've deployed Node apps before by fronting them with Apache/nginx (via reverse proxy) and then keeping the node process up with Upstart</li>
<li>Is this performant enough for DataONE? I think my implementation is non-blocking but I'm not a Node expert so we'd want to code review and probably benchmark </li>
<li>We could wait on (1) and stick with our current deployment strategy</li>
</ul>
<a name="Other-notes"></a>
<h2 >Other notes<a href="#Other-notes" class="wiki-anchor">¶</a></h2>
<p>Unrelated to the Google Dataset Search issue but related to Google's crawling for Google Search, we've also identified:</p>
<ul>
<li>That the Metacat View Service is often unreasonably slow: <a href="https://github.com/NCEAS/metacat/issues/1234">https://github.com/NCEAS/metacat/issues/1234</a> and are planning to figure out why</li>
<li>That we can and should make use of sitemaps to help Google crawl our pages: <a href="https://github.com/NCEAS/metacat/issues/1263">https://github.com/NCEAS/metacat/issues/1263</a></li>
</ul>
Member Nodes - MNDeployment #7973 (New): Alaska Energy Data Gatewayhttps://redmine.dataone.org/issues/79732017-01-27T15:19:28ZLaura Moyerslmoyers1@utk.edu
<p>At a meeting 26 January 2017, it was mentioned that GINA is working with the Alaska Energy Data Gateway on a proposal where there is a component for implementing a DataONE Member Node for AEDG. This is still very early in the proposal stage, but the possibility exists for an AEDG MN in future.</p>
Member Nodes - MNDeployment #7962 (Deferred): Alaska EPSCoRhttps://redmine.dataone.org/issues/79622017-01-09T19:00:08ZRebecca Koskelarkoskela@unm.edu
<p>GINA also runs the Alaska EPSCoR repository and because it's a separate project, it should also be a separate MN</p>
<p><a href="http://www.alaska.edu/epscor/">http://www.alaska.edu/epscor/</a><br>
Current portal that is hosted by GINA: <a href="http://epscor.alaska.edu/">http://epscor.alaska.edu/</a></p>
Member Nodes - MNDeployment #7163 (New): UN Environmental Programme World Conservation Monitoring...https://redmine.dataone.org/issues/71632015-06-07T17:14:16ZLaura Moyerslmoyers1@utk.edu
<p>Key holding: World database of protected lands<br>
Dependencies: They are in the process of implementing CSW. So, we need the slender node and we need them to get their CSW up.<br>
Location: Cambridge, England</p>
Member Nodes - MNDeployment #7048 (Deferred): USGS Regional Climate Centershttps://redmine.dataone.org/issues/70482015-04-17T17:24:19ZBruce Wilsonbwilso27@utk.eduJava Client - Story #3666 (In Progress): D1Client.listUpdateHistory() needs to handle changing ac...https://redmine.dataone.org/issues/36662013-03-15T22:51:23ZRob Nahfrnahf@epscor.unm.edu
<p>the current D1Client.listUpdateHistory() method needs to gracefully handle the situation where a NotAuthorized request is returned. the ObsoletesChain client class may need to be refactored to allow for this exception to be held so it can notify the user where appropriate.</p>
<p>Ostensibly, with a NotAuthorized, the user will not have access to either the tail or head of the chain, so can't return the head or tail, depending on how access changes.</p>
Member Nodes - MNDeployment #3520 (New): ARCTOS Collaborative Collection Management Solutionhttps://redmine.dataone.org/issues/35202013-01-25T19:33:16ZRebecca Koskelarkoskela@unm.edu
<p>Arctos is a collaborative Collection Management Information System that currently serves over 1.7 million records from multiple natural history collections. The data reside in an Oracle database administered by the Texas Advanced Computing Center. We are looking for off-site options for data backup and disaster recovery, and would like to discuss the possibility of partnering with DataONE for this purpose. Perhaps we can set up a conference call to discuss this?</p>
<p>Contacts:Carla Cicero <a href="mailto:ccicero@berkeley.edu">ccicero@berkeley.edu</a> <br>
Dusty McDonald <a href="mailto:dustymc@gmail.com">dustymc@gmail.com</a>,<br>
Chris Jordan <a href="mailto:ctjordan@tacc.utexas.edu">ctjordan@tacc.utexas.edu</a>,<br>
Link Olson <a href="mailto:leolson@alaska.edu">leolson@alaska.edu</a>,<br>
Gordon Jarrell <a href="mailto:gordon.jarrell@gmail.com">gordon.jarrell@gmail.com</a></p>
<p>Conference Call on 24 January 2013</p>
Member Nodes - MNDeployment #3519 (New): Prairie Research Institutehttps://redmine.dataone.org/issues/35192013-01-25T19:03:58ZJohn Cobbjohnw.cobb+dataoneRM@gmail.com
<p>on 20130125 LT call, Dave V. mentioned a conversation with Randy Butler about the Prarie Research Institute.</p>
<p>The following is a snippet form ePad <a href="http://epad.dataone.org/2013Jan25-LT-VTC">http://epad.dataone.org/2013Jan25-LT-VTC</a></p>
<p>Discussion with Randy Butler also touched on interest by the various Illinois Surveys (Prarie Research Institute) (e.g. Natural History Survey, Geological Survey, Hydrological Survey) on how they can participate in DataONE, perhaps as a set of Member Nodes. NH Survey likely to be the first candidate according to Randy ("endangered data").</p>
Infrastructure - Task #3333 (New): Generalize mk_* scripts for host namehttps://redmine.dataone.org/issues/33332012-10-11T18:50:14ZDave Vieglaisdave.vieglais@gmail.com
<p>There is currently a dependency of "-1" in the hostname for the various custom check_mk custom plugins. </p>
<p>This should be generalized at some point.</p>
<p>This didn't work:</p>
<p>testhost=$(echo $thishost | sed s/-[a-z]*-([0-9])/-${host}-\1/)</p>
Member Nodes - MNDeployment #3258 (New): CitSci.org MNhttps://redmine.dataone.org/issues/32582012-09-20T20:44:30ZBruce Wilsonbwilso27@utk.edu
<p>CitSci.org provides a toolset, including the hosting of data, for citizen science (PPSR -- public participation in science and research) projects. CitSci.org currently hosts data from approximately 50 such projects. Greg Newman (Colorado State University) is the lead on CitSci.org and is a member of the PPSR working group. </p>
<p>Handling of streaming (mutable) data is a key need. </p>
Infrastructure - Bug #3246 (New): Metacat returns 500 instead of 404 in some caseshttps://redmine.dataone.org/issues/32462012-09-11T02:00:24ZDave Vieglaisdave.vieglais@gmail.com
<p>For example:</p>
<p><a href="https://knb.ecoinformatics.org/knb/d1/mn/v1/bogus">https://knb.ecoinformatics.org/knb/d1/mn/v1/bogus</a></p>
<p>should return a 404 NotFound error, but instead returns a 500, ServiceFailure. </p>
<p>This is not an urgent issue, but should probably be cleaned up.</p>
Infrastructure - Task #2286 (New): Change Exceptions.InvalidToken to Exceptions.InvalidSessionhttps://redmine.dataone.org/issues/22862012-02-03T02:57:36ZRoger Dahldahl@unm.eduRequirements - Requirement #580 (New): (Requirement) All software developed on the project should...https://redmine.dataone.org/issues/5802010-04-15T22:52:38ZDave Vieglaisdave.vieglais@gmail.com
<p>It is necessary (desired and indicted in proposal) that all products developed on the project would be released with an appropriate, recognized open source license.</p>
<p>h2. Rationale</p>
<p>Open source is a good thing. See <a href="http://www.opensource.org/docs/definition.php">http://www.opensource.org/docs/definition.php</a> for more details.</p>
<p>h2. Fit Criteria</p>
<ul>
<li>All software developed on the project includes an approved open source license</li>
<li>Selected licenses do not conflict with the licenses used by components that are extended or utilized by custom software developed on the project</li>
<li>A software license audit can be successfully passed. </li>
</ul>
Requirements - Requirement #317 (In Progress): (Requirement) Identifiers for all objectshttps://redmine.dataone.org/issues/3172010-03-10T16:57:35ZDave Vieglaisdave.vieglais@gmail.com
<p>All objects in the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> system have an identifier that is unique within all of the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> infrastructure and can be used to retrieve the object from the system.</p>
<p>Rationale:</p>
<p>Unique identifiers provide the fundamental, low level mechanism for<br>
interacting with content stored in the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> system. Unique identifiers help prevent duplication and improve re-usability of information.</p>
<p>Fit Criteria:</p>
<ul>
<li><p>Any object stored within <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> receives an identifier that is unique within the <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> infrastructure.</p></li>
<li><p>Any object with a <a class="wiki-page new" href="https://redmine.dataone.org/projects/d1req/wiki/DataONE">DataONE</a> identifier can be retrieved from the infrastructure.</p></li>
<li><p>Attempts to create a duplicate identifier are prevented.</p></li>
</ul>