DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-07-22T16:35:20ZDataONE Tasks
Redmine Member Nodes - MNDeployment #8828 (In Review): SESARhttps://redmine.dataone.org/issues/88282019-07-22T16:35:20ZAmy Forresteraforres4@utk.edu
<p>The Single European Sky ATM Research (SESAR) project was launched in 2004 as the technological pillar of the Single European Sky (SES). Its role is to define, develop and deploy what is needed to increase ATM performance and build Europe’s intelligent air transport system.</p>
<p>GOAL<br>
* linking datasets to sample metadata by having projects register samples for IGSN identifiers = SESAR data integration </p>
Member Nodes - Story #8727 (In Progress): NCAR Discovery & Assessmenthttps://redmine.dataone.org/issues/87272018-10-04T16:48:23ZAmy Forresteraforres4@utk.edu
<p>Establish contact and build relationship with a potential new member node. Determine if DataONE and the repository are a good fit for one another and if the repository generally meets the requirements of DataONE member nodes. </p>
<p>This story is complete when a determination is made to either proceed with a new deployment, or that joining DataONE is not an option for the repository at this time.</p>
Member Nodes - MNDeployment #8725 (In Review): NCAR (National Center for Atmospheric Research)https://redmine.dataone.org/issues/87252018-10-03T19:27:07ZAmy Forresteraforres4@utk.edu
<p>NCAR wide initiative -- Data Stewardship Engineering Team (DSET) to lead the organization’s efforts to provide enhanced, comprehensive digital data discovery and access. This community-driven infrastructure will support a user-focused, integrated system for the discovery and access of digital scientific assets that include datasets and supporting metadata, publications (text,images, audio, video), software applications, and model codes. <a href="https://internal-ncar.ucar.edu/data-stewardship-engineering-team-dset">https://internal-ncar.ucar.edu/data-stewardship-engineering-team-dset</a></p>
Infrastructure - Decision #8693 (In Progress): Support Google Dataset Search on search.dataone.or...https://redmine.dataone.org/issues/86932018-09-07T00:16:59ZBryce Mecummecum@nceas.ucsb.edu
<a name="Background"></a>
<h2 >Background<a href="#Background" class="wiki-anchor">¶</a></h2>
<p>Yesterday, <a href="https://toolbox.google.com/datasetsearch" class="external">Google Dataset Search</a> launched. We previoiusly attempted to make MetacatUI (and by extension, DataONE Search) compatible with it by <a href="https://github.com/NCEAS/metacatui/issues/482" class="external">injecting Schema.org JSON-LD into appropriate pages</a>. During development and testing, we checked our compatibility with the upcoming Google Dataset Search using Google's <a href="https://search.google.com/structured-data/testing-tool" class="external">Structured Data Testing Tool</a>. During development, this was all working fine and the feature appeared to be compatible but, after launching the feature on search.dataone.org, behavior changed on Google's end making it so Google no longer saw this JSON-LD. The reason for this is likely that, because MetacatUI follows a single page application architecture and we inject the JSON-LD on the client side, Google's JSON-LD crawler only saw what was sent from the server (a nearly empty index.html) and not our full page (with JSON-LD). I was able to test this theory and, while Google's crawler does execute JavaScript, it limits execution to about or exactly five seconds and MetacatUI <em>usually</em> doesn't finish injecting JSON-LD and rendering all content until after that timeout.</p>
<p>Potential paths forward to get DataONE Search compatible with Google's Dataset Search include (none of which are mutually exclusive):</p>
<ol>
<li>The assets that make up MetacatUI and the asset loading strategies could be optimized: <a href="https://github.com/NCEAS/metacatui/issues/224">https://github.com/NCEAS/metacatui/issues/224</a></li>
<li>Move the code (and any dependencies) that injects JSON-LD further up in the app boot so that Google sees it</li>
<li>Inject the appropriate JSON-LD on the server side to guarantee that Google sees it (originally Matt Jones' idea!)</li>
</ol>
<p>(1) is being worked on for sure, and (2) may not be needed if (1) is successful. I want to talk about option (3) because:</p>
<ul>
<li>It's a quicker solution (I already have something working) which would help get us involved in the project faster</li>
<li>It paves the way for future features and/or improvements to MetacatUI (we could be rendering more on the server side than just JSON-LD, like other metadata, more page content, etc)</li>
</ul>
<a name="What-I-did"></a>
<h2 >What I did<a href="#What-I-did" class="wiki-anchor">¶</a></h2>
<p>To test this idea, I modified a <a href="https://github.com/amoeba/backbone-pushstate-example" class="external">previous project</a> which is just a simple Node (Express.js) app that hosts MetacatUI by intercepting every request and serving the appropriate asset. In injects Schema.org JSON-LD, when appropriate, by querying the CN Solr index before sending MetacatUI's index.html to the client. <a href="https://github.com/amoeba/metacatui-ssr" class="external">Code is here</a> and its deployed <a href="http://neutral-cat.nceas.ucsb.edu/" class="external">here</a>. View source on any /view/... pages and you'll see a minimal Schema.org/Dataset description in the head. More properties can be added later. I did it quick and dirty: The app pre-loads MetacatUI's index.html as a <code>String</code> at app boot and injects the JSON-LD into it. No templating language or other magic.</p>
<a name="Things-to-address"></a>
<h2 >Things to address<a href="#Things-to-address" class="wiki-anchor">¶</a></h2>
<ul>
<li>How do we feel abouts switching from hosting MetacatUI via Apache (simple, bullet proof) to a Node based deployment just to support this feature (new territory, at least for me)?</li>
<li>If we do switch, we'd want to make really sure the Node app doesn't have weird failure cases where it doesn't return index.html (e.g., when Solr is down, or slow). The app needs to return index.html (and every other static asset) on every request and do it very fast and we should decide what the cutoff is so that it doesn't hold up app boot if Solr is slow/down.</li>
<li>Can this type of deployment easily be integrated with CN buildouts? I've deployed Node apps before by fronting them with Apache/nginx (via reverse proxy) and then keeping the node process up with Upstart</li>
<li>Is this performant enough for DataONE? I think my implementation is non-blocking but I'm not a Node expert so we'd want to code review and probably benchmark </li>
<li>We could wait on (1) and stick with our current deployment strategy</li>
</ul>
<a name="Other-notes"></a>
<h2 >Other notes<a href="#Other-notes" class="wiki-anchor">¶</a></h2>
<p>Unrelated to the Google Dataset Search issue but related to Google's crawling for Google Search, we've also identified:</p>
<ul>
<li>That the Metacat View Service is often unreasonably slow: <a href="https://github.com/NCEAS/metacat/issues/1234">https://github.com/NCEAS/metacat/issues/1234</a> and are planning to figure out why</li>
<li>That we can and should make use of sitemaps to help Google crawl our pages: <a href="https://github.com/NCEAS/metacat/issues/1263">https://github.com/NCEAS/metacat/issues/1263</a></li>
</ul>
Infrastructure - Bug #8655 (New): Synchronization died with OOMhttps://redmine.dataone.org/issues/86552018-07-13T11:24:16ZDave Vieglaisdave.vieglais@gmail.com
<p>d1-processing became unresponsive. cn-synchronization log showed:<br>
<code><br>
[ERROR] 2018-07-12 18:28:26,875 [ProcessDaemonTask1] (SyncObjectTaskManager:run:84) java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space<br>
java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: Java heap space<br>
at java.util.concurrent.FutureTask.report(FutureTask.java:122)<br>
at java.util.concurrent.FutureTask.get(FutureTask.java:192)<br>
at org.dataone.cn.batch.synchronization.SyncObjectTaskManager.run(SyncObjectTaskManager.java:76)<br>
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)<br>
at java.util.concurrent.FutureTask.run(FutureTask.java:266)<br>
at java.lang.Thread.run(Thread.java:748)<br>
Caused by: java.lang.OutOfMemoryError: Java heap space<br>
[ INFO] 2018-07-12 18:28:49,862 [ProcessDaemonTask1] (SyncObjectTaskManager:run:110) SyncObjectTaskManager Complete<br>
[ WARN] 2018-07-12 20:41:15,788 [hz.client.2.Listener] (NodeTopicListener:onMessage:68) urn:node:OTS_NDC- NodeTopicListener Disabl<br>
</code></p>
<p>d1-processing is running with:<br>
<code><br>
-Djava.awt.headless=true -XX:UseParallelGC -Xmx4096M -Xms1024M -Xss1280k -XX:MaxPermSize=512M<br>
</code></p>
CN REST - Story #8364 (In Progress): Ensure portal uses correct X509 certificateshttps://redmine.dataone.org/issues/83642018-02-13T20:17:25ZChris Jonescjones@nceas.ucsb.edu
<p>We've run into issues where after an upgrade of the <code>dataone-cn-portal</code> package on the CNs, the properties pointing to the public certificate and private key are incorrectly pointing to the old GeoTrust wildcard files rather than the new Lets Encrypt files:<br>
<br>
cn.server.publiccert.filename=/etc/ssl/certs/<em>.test.dataone.org.crt<br>
cn.server.privatekey.filename=/etc/ssl/private/</em>.test.dataone.org.key</p>
<p>These should be (in STAGE):</p>
<p>/etc/letsencrypt/live/cn-stage.test.dataone.org/cert.pem<br>
/etc/letsencrypt/live/cn-stage.test.dataone.org/privkey.pem</p>
<p>The issue might be that these are not being set correctly during the <code>postinst</code> script run. Jing pointed out that these values are taken from the debconf database settings that get set when <code>dataon-cn-os-core</code> is installed. So although the <code>postinst</code> script might be setting the correct values, the old cached values might still be in memory in the debconf database. If so, we'll need to clear those values during installations and upgrades.</p>
<p>Also, knowing where to look for these configuration settings can be challenging. These are referenced from <code>/var/lib/tomcat7/webapps/portal/WEB-INF/portal.properties</code>. These settings should be consolidated into <code>/etc/dataone/portal/portal.properties</code> so they also don't get blown away on war file upgrades in Tomcat.</p>
Infrastructure - Story #8049 (In Progress): Support synchronization of system metadata for unhost...https://redmine.dataone.org/issues/80492017-03-21T05:34:38ZRob Nahfrnahf@epscor.unm.edu
<p>As part of mutable-content MN support, allow the MemberNode to keep the system metadata records for all resultant versions of its changeable entities. this allows them to keep accurate system metadata for every version even though they do not have the object bytes anymore for that version.</p>
<p>Benefits:<br>
1. MN does not orphan any objects<br>
2. MN can administer objects from past versions on their own MN. Adjust the access policy of all versions, for example.<br>
3. don't need to call cn.setObsoletedBy or leave that field empty.</p>
<p>Costs:<br>
1. requires new logic for indexing (possibly)<br>
2. requires new logic for registerSystemMEtadata (possibly)<br>
3. require new logic for synchronization </p>
<p>very similar to how we synchronize DATA objects, but don't trigger MN replication.</p>
Member Nodes - Task #7929 (In Progress): Archive content for SEADhttps://redmine.dataone.org/issues/79292016-11-09T17:11:19ZDave Vieglaisdave.vieglais@gmail.com
<p>Charitha has requested some content to be deleted and some to be archived for the production instance of SEAD.</p>
<p>The following entries are to be archived. The reason for archive is that the entries are from the prior installation of the SEAD MN and have been replaced by different content. The entries will remain available from the CN but are to be removed from the index:</p>
<p>seadva-HsuLeslie029090a9-11b8-4fc1-bf76-bb5a8153363f<br>
seadva-nonee903e476-9bdd-4332-823a-aabea162acd6<br>
seadva-EssawyBakinam066de0b8-a0c9-4724-913a-9060f82148f1<br>
seadva-EssawyBakinamc8e53366-8745-4009-bb38-786ee49cd6fe<br>
seadva-EssawyBakinam70c6e869-5518-4c75-8d09-6a808bb41fb3<br>
seadva-ZhouQuane9e0f510-1599-4311-a6ed-ecf803f3481f</p>
<p>The following entries were deleted using the cn.delete operation and the CN certificate for authentication. The reason for deletion is that these were test objects that should never of entered the production environment:</p>
<p>seadva-a75a8b4f-7d35-4c9a-b94a-6e0f8194e6f9<br>
seadva-080d1744-c0d3-455b-baad-6a78b8af3481<br>
seadva-a4f1f73e-a1ee-4246-8d18-ce373b4cbfaa<br>
seadva-ae11f4cc-76b2-4be5-8e03-b32d95f8bfc4<br>
seadva-0f3e936c-1129-44bd-a417-c700fe0cd1a2<br>
seadva-e056e535-da57-40c0-90c1-c2e55d2ec573</p>
Member Nodes - Story #5833 (In Progress): GBIF: Developinghttps://redmine.dataone.org/issues/58332014-07-17T21:05:32ZRoger Dahldahl@unm.edu
<p>Determine which software stack to use, etc.</p>
Java Client - Task #4672 (In Progress): mvn dependencies:analyse shows transitive dependencies:https://redmine.dataone.org/issues/46722014-03-31T16:06:07ZDave Vieglaisdave.vieglais@gmail.com
<p>mvn dependencies:analyse shows transient dependencies:<br>
[WARNING] Used undeclared dependencies found:<br>
[WARNING] xml-apis:xml-apis:jar:1.0.b2:runtime<br>
[WARNING] log4j:log4j:jar:1.2.16:compile<br>
[WARNING] backport-util-concurrent:backport-util-concurrent:jar:3.1:compile<br>
[WARNING] commons-logging:commons-logging:jar:1.1.1:compile<br>
[WARNING] commons-lang:commons-lang:jar:2.6:compile<br>
[WARNING] commons-collections:commons-collections:jar:3.2.1:compile<br>
[WARNING] org.apache.httpcomponents:httpcore:jar:4.1.4:compile<br>
- suggest fixing these</p>
Java Client - Task #4671 (In Progress): suggest updating the following versions:https://redmine.dataone.org/issues/46712014-03-31T16:05:39ZDave Vieglaisdave.vieglais@gmail.com
<p>suggest updating the following versions:<br>
[INFO] The following dependencies in Dependencies have newer versions:<br>
[INFO] commons-fileupload:commons-fileupload ................... 1.2.2 -> 1.3<br>
[INFO] commons-io:commons-io ................................... 2.0.1 -> 2.4<br>
[INFO] joda-time:joda-time ....................................... 2.1 -> 2.3<br>
[INFO] junit:junit ............................................ 4.8.2 -> 4.11<br>
[INFO] org.apache.httpcomponents:httpclient ............... (,4.1.3] -> 4.3.3<br>
[INFO] org.apache.httpcomponents:httpmime ................. (,4.1.3] -> 4.3.3</p>
Member Nodes - MNDeployment #4115 (Deferred): UTK Institutional Repositoryhttps://redmine.dataone.org/issues/41152013-10-25T00:28:16ZBruce Wilsonbwilso27@utk.edu
<p>UTK is in process of replacing TRACE with a system to manage both data and metadata. Currently, they expect that DSpace is the likely technology for an institutional repository, and they intend for this to to be a DataONE MN.</p>
Infrastructure - Story #4052 (In Progress): OPeNDAP MN Storyhttps://redmine.dataone.org/issues/40522013-10-06T20:07:50ZBruce Wilsonbwilso27@utk.edu
<p>There are a large number of possible DataONE MN's running Data Access Protocol (DAP) compliant servers, particularly servers based on the OPeNDAP Hyrax and UCAR THREDDS servers. The objective of this work is to develop a MN stack that can be used with at least one of these DAP-compliant software stacks as a low-barrier route to becoming a Tier 1 MN.</p>
Member Nodes - MNDeployment #3689 (Deferred): University of South Carolina, Computer Science Depa...https://redmine.dataone.org/issues/36892013-03-22T17:02:41ZLaura Moyerslmoyers1@utk.edu
<p>From DataONE website "contact us" – Considering, someone to talk to</p>
<p>Website: <a href="http://www.sc.edu">http://www.sc.edu</a>, <a href="https://www.cse.sc.edu/">https://www.cse.sc.edu/</a> (comp sci dept)<br>
Entity: University of South Carolina<br>
POC: <a href="mailto:okeefe@cse.sc.edu">okeefe@cse.sc.edu</a><br>
Date of inquiry: 12/18/12<br>
Responder:<br><br>
Date of response:<br><br>
Response: </p>
Infrastructure - Task #1556 (In Progress): Interns mailing listhttps://redmine.dataone.org/issues/15562011-05-12T20:19:01ZAmber Buddenaebudden@gmail.com
<p>Can you please clear the current 'interns' mailing list in preparation for use in 2011. It currently sends out to last years interns. Once the new interns have set up plone accounts I will submit a new request to have them added.</p>