DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2020-09-16T20:42:07ZDataONE Tasks
Redmine Infrastructure - Story #8869 (New): Equivalent identities show owning different amount of package...https://redmine.dataone.org/issues/88692020-09-16T20:42:07ZJing Taotao@nceas.ucsb.edu
<p>Matt reported he could see 188 packages listed as <code>My Dataset</code> on the profile page of KNB, which including two packages he created as his ORCID, when he logged in by his ldap account. However, he only could see two packages when he logged in by his ORCID even though the two identities are set to be equivalent:<br>
<a href="https://cn.dataone.org/cn/v2/accounts/http%3A%2F%2Forcid.org%2F0000-0003-0077-4738">https://cn.dataone.org/cn/v2/accounts/http%3A%2F%2Forcid.org%2F0000-0003-0077-4738</a></p>
<p>It seems the equivalent identities are not commutative.</p>
<p>Matt also noticed that the equivalent ldap id for his ORCID is <code>UID=jones,O=NCEAS,DC=ecoinformatics,DC=org</code>, which uses <code>UID</code> rather than <code>uid</code>. He suspected this is the issue to cause the problem.</p>
CN REST - Bug #8867 (New): CNCore.listChecksumAlgorithms() returns incorrect listhttps://redmine.dataone.org/issues/88672020-08-06T00:06:07ZMatthew Jonesjones@nceas.ucsb.edu
<p>The definition of the ChecksumAlgorithm type in SystemMetadata allows any checksum algorithm listed in the Library of Congress vocab. But the current CNCore.listChecksumAlgorithms() implementation only returns two, MD5 and SHA-1. Need to correct this to include the full list of supported algorithms (see <a href="http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions.html">http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions.html</a>).</p>
<p>The implementation of this is in a property file, which needs to be updated with the correct list. The file (d1_cn_rest/src/test/resources/org/dataone/configuration/node.properties) currently contains:</p>
<p><code>cn.checksumAlgorithmList=SHA-1;MD5</code></p>
<p>But it should contain all of the other valid algorithms as well from the LoC.</p>
Infrastructure - Bug #8866 (New): Java client tools should set a custom user agent stringhttps://redmine.dataone.org/issues/88662020-07-15T18:03:40ZBryce Mecummecum@nceas.ucsb.edu
<p>Related to <a href="https://redmine.dataone.org/issues/7047">https://redmine.dataone.org/issues/7047</a></p>
<p>It looks like nowhere in <code>d1_libclient_java</code> do we set a user agent string. Aside from being best practice, it limits our ability to customize our infrastructure around it. For example, OPC is running into HTTP 413s due to overrunning their TLS renegotiation buffer and we can't effectively whitelist their requests, which come from our Java client tools, to allow them to upload large files.</p>
CN REST - Story #8864 (New): Sychronization does not register authoritative replica entry correctlyhttps://redmine.dataone.org/issues/88642020-06-17T21:49:55ZChris Jonescjones@nceas.ucsb.edu
<p>When objects are synchronized to the CN, the <code>d1_synchronization</code> component will fetch the system metadata <br>
for each object and will add a <code><replica></code> entry for the origin node (like <code>urn:node:ESS_DIVE</code>, <br>
as well as entries for other copies (for instance for science metadata copied to the CN, <br>
a <code><replica>urn:node:CN</replica></code> will be added.</p>
<p>In some instances, the origin replica instance is not added to the replica list.<br><br>
This causes downstream problems for the <code>d1_replication</code> component because it relies on the origin node <br>
replica entry to be present in order to set up a replication request to a target node. I'm seeing errors like:</p>
<pre>/var/log/dataone/replicate/cn-replication.log.90:[ERROR] 2020-06-04 05:18:30,179 [pool-15-thread-1] (MNCommunication:requestReplication:34) Could not determine replication source node for replication request for pid: ess-dive-eb6cbb22c605506-20200122T170607966. Replication request failed.
</pre>
<p>Looking back in the logs, this is the case for the following objects:</p>
<pre>ess-dive-3947e68e9825233-20180621T213650539
ess-dive-3b8d9f4513e02f9-20180621T214221437
ess-dive-467a6c3dda4dc88-20180621T211148554
ess-dive-51f345daca126f7-20180328T160350610716
ess-dive-53b37ae5d8c0f51-20200219T211634419654
ess-dive-6b688fab5524c46-20200121T210154766
ess-dive-7a31346c154f02b-20200127T155012488
ess-dive-a1fb05cbd903309-20200130T190835651
ess-dive-b420b097851c716-20180523T161714606
ess-dive-ba81a8a8e0bef31-20180727T200828345
ess-dive-bfaf3d6d6fd038c-20180716T154005175903
ess-dive-c2ef5f3af108c9c-20180621T220020545
ess-dive-eb6cbb22c605506-20200122T170607966
ess-dive-f3238db16593de5-20180621T215956950
</pre>
<p>We need to fix this issue in <code>d1_synchronization</code> so replication runs correctly and monthly <br>
replica auditing (done by ESS_DIVE) doesn't flag these issues.</p>
Infrastructure - Story #8862 (New): Deploy a new dataone-cn-rest releasehttps://redmine.dataone.org/issues/88622020-04-23T16:24:46ZJing Taotao@nceas.ucsb.edu
<p>We have a new d1_portal jar release which addresses the issue that restarting tomcat in CNs is needed when the LE certificates are renewed in CNs. The new d1_portal jar file has been deployed to dataone-cn-portal. However, the component dataone-cn-rest was overlooked. We need to deploy it there as well.<br>
Yesterday, we did a hack fix in CNs when we restarted tomcat - dropped the d1_portal-2.3.2.jar file there. So now it should work. But we still need a formal release.</p>
CN REST - Bug #8860 (New): /token endpoint doesn't set a content-type and character encodinghttps://redmine.dataone.org/issues/88602020-02-29T01:00:11ZBryce Mecummecum@nceas.ucsb.edu
<p>On Firefox only, requests to the /portal/token endpoint (i.e., the one MetacatUI and other clients use to fetch their auth tokens, like <a href="https://cn.dataone.org/portal/token">https://cn.dataone.org/portal/token</a>) result in errors in the browser console.</p>
<p>When you access the URL via an XHR request, you see:</p>
<blockquote>
<p>XML Parsing Error: syntax error<br>
Location: <a href="https://cn-stage.test.dataone.org/portal/token">https://cn-stage.test.dataone.org/portal/token</a><br>
Line Number 1, Column 1:</p>
</blockquote>
<p>When you access the URL directly in Firefox:</p>
<blockquote>
<p>The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.</p>
</blockquote>
<p>I had a hunch that this error would go away if the response simply had the <code>Content-Type</code> header set to <code>text/plain; charset=utf-8</code> so I spun up <code>mitmproxy</code>, made that edit to the intercepted response, and saw that the error does go away.</p>
<p>I think we should modify the portal code to set the <code>Content-Type</code> header like above so the error goes away.</p>
Infrastructure - Task #8858 (New): Update CN Apache configs in version control with directives to...https://redmine.dataone.org/issues/88582020-02-05T20:02:12ZBryce Mecummecum@nceas.ucsb.edu
<p>Sitemaps are located on disk in ${tomcat_webapps_dir}/${context}/sitemaps as <code>sitemap_index.xml</code> and <code>sitemap%d.xml</code> (for each sub-sitemap).</p>
<p>The rule we've come up with is:</p>
<p><code>RewriteRule ^/(sitemap.+) /metacat/sitemaps/$1 [R=303]</code></p>
Infrastructure - Story #8857 (New): D1Client.getCN() always get the production cn on the CN Tomca...https://redmine.dataone.org/issues/88572019-12-13T00:34:03ZJing Taotao@nceas.ucsb.edu
<p>Today Val from ess-dive reported an issue that in the cn-stage environment, the rest call <code>cn/v2/diag/subject</code> didn't return any group information for a user even though the <code>cn/v2/accounts</code> call proves the user is in some groups.</p>
<p>After looking at the code, it seems suspicious that the <code>cn/v2/diag/subject</code> uses the method <code>D1Client.getCN().getSubjectInfo(subject)</code> to get the suer information. I guess it aways uses the production cn rather than the cn-stage. I put the property <code>D1Client.CN_URL=https://cn-stage.test.dataone.org/cn</code> on the file <code>/var/lib/tomcat8/webapps/cn/WEB-INF/classes/org/dataone/configuration/portal.properties</code>, then it works.</p>
<p>So we need to set up the property during our package building process.</p>
Infrastructure - Story #8856 (New): Put the system metadata part ahead of the object part when d1...https://redmine.dataone.org/issues/88562019-11-22T18:31:19ZJing Taotao@nceas.ucsb.edu
<p>When a client calls the mn.create/update methods, it usually constructs a multipart which contains the sys part (containing the system metadata information), object part (containing the object itself) and other parts. There is no requirement about the order of those parts.<br>
Metacat will use a new streaming multipart handler which will calculate the checksum when it stores the object part into a file. This requires we should know the checksum algorithm before the serialization of the object. So Metacat has to digest the system metadata first in order to improve the performance.<br>
In order to take the advantage, we recommend clients should put the system metadata part ahead of the object part when it is constructing the multipart to be sent to the server.<br>
Note: event though the client doesn't use the recommended order, the process still works but the performance will be poor.</p>
Infrastructure - Story #8855 (New): Put the system metadata part ahead of the object part when d1...https://redmine.dataone.org/issues/88552019-11-22T18:29:39ZJing Taotao@nceas.ucsb.edu
<p>When a client calls the mn(cn).create/update methods, it usually constructs a multipart which contains the sys part (containing the system metadata information), object part (containing the object itself) and other parts. There is no requirement about the order of those parts.<br>
Metacat will use a new streaming multipart handler which will calculate the checksum when it stores the object part into a file. This requires we should know the checksum algorithm before the serialization of the object. So Metacat has to digest the system metadata first in order to improve the performance.<br>
In order to take the advantage, we recommend clients should put the system metadata part ahead of the object part when it is constructing the multipart to be sent to the server.<br>
Note: event though the client doesn't use the recommended order, the process still works but the performance will be poor.</p>
Infrastructure - Story #8854 (New): Put the system metadata part ahead of the object part when d...https://redmine.dataone.org/issues/88542019-11-22T18:25:20ZJing Taotao@nceas.ucsb.edu
<p>When a client calls the mn.create/update methods, it usually constructs a multipart which contains the sys part (containing the system metadata information), object part (containing the object itself) and other parts. There is no requirement about the order of those parts.<br>
Metacat will use a new streaming multipart handler which will calculate the checksum when it stores the object part into a file. This requires we should know the checksum algorithm before the serialization of the object. So Metacat has to digest the system metadata first in order to improve the performance.<br>
In order to take the advantage, we recommend clients should put the system metadata part ahead of the object part when it is constructing the multipart to be sent to the server.<br>
Note, event thought the client doesn't use the recommended order, the process still works but the performance will be poor.</p>
Infrastructure - Task #8817 (New): Configure sitemaps on the CNhttps://redmine.dataone.org/issues/88172019-06-06T23:52:23ZBryce Mecummecum@nceas.ucsb.edu
<p>Support for sitemaps landed last fall in Metacat: <a href="https://github.com/NCEAS/metacat/pull/1283">https://github.com/NCEAS/metacat/pull/1283</a>. Sitemaps are good for users but especially for search engines and DataONE's Search Catalog could benefit from having sitemaps enabled. A sitemap could help crawlers discover all of the datasets in DataONE. The CNs already run Metacat and should use Metacat's sitemaps ability to generate sitemaps for all content.</p>
<p>To enable sitemaps on the CNs, a few things seem to be needed. I'm not very familiar with how the CNs get built so I may be wrong or be missing things:</p>
<ul>
<li>Sitemaps rely on two properties in Metacat's <code>metacat.properties</code> file, which should have values: <code>sitemap.location.base=https://search.dataone.org/</code> and <code>sitemap.entry.base=https://search.dataone.org/view</code></li>
<li>The Apache config the CNs are built with need to serve the <code>sitemap_index.xml</code> and individual sitemaps from the Tomcat webapps dir. Metacat generates sitemaps in the <code>sitemaps</code> subfolder (e.g., <code>/usr/lib/tomcat8/webapps/metacat/sitemaps</code>). A <code>Directory</code> directive should work so long as filesystem permissions are set up for Apache to see the files.</li>
<li>We need a robots.txt that points at the sitemap index file at search.dataone.org which provides the entrypoint to <code>sitemap_index.xml</code></li>
<li>Metacat generates sitemaps with a recurring job mechanism that's internal to Metacat. AFAIK this job isn't turned on when Tomcat loads Metacat and a request has to get sent to the admin API which turns this job on as a side-effect. We might want to change this to reduce maintenance burden or chance of having stale sitemaps</li>
</ul>
<p>Dave nominated Jing for this work and has targeted this for the next CCI release. I'm not sure which that is so please select whichever one is appropriate.</p>
<p>Note this relates to <a href="https://redmine.dataone.org/issues/8693">https://redmine.dataone.org/issues/8693</a> which we've delayed because Google's crawler infrastructure has changed and DataONE is now visible by Google. Google staff have indicated we only need to send them a robots.txt that points to our sitemaps for them to begin crawling.</p>
Member Nodes - MNDeployment #6562 (Operational): BCO-DMOhttps://redmine.dataone.org/issues/65622014-11-11T20:13:41ZMatthew Jonesjones@nceas.ucsb.edu
<p>The Biological and Chemical Oceanography Data Management Office (BCO-DMO) (<a href="http://www.bco-dmo.org/">http://www.bco-dmo.org/</a>)</p>
<p>Main contact: Adam Shepherd <a href="mailto:ashepherd@whoi.edu">ashepherd@whoi.edu</a><br>
Additional Contact: Cyndy Chandler <a href="mailto:cchandler@whoi.edu">cchandler@whoi.edu</a></p>
<p>As of today, the BCO-DMO repository holds over 7142 data sets from 419 projects.</p>
<p>Implementation: Their system is based on Drupal.</p>
<p>Associated with GeoLink project which Matt Jones and Mark Schildhauer are on.</p>
Member Nodes - MNDeployment #3230 (Planning): ARM - Atmospheric Radiation Measurement member nodehttps://redmine.dataone.org/issues/32302012-09-07T00:21:59ZDave Vieglaisdave.vieglais@gmail.com
<p>This issue captures the activity associated with deployment or otherwise of the "Atmospheric Radiation Measurement" member node.</p>
Member Nodes - MNDeployment #3213 (Operational): University of Illinois, Chicago member nodehttps://redmine.dataone.org/issues/32132012-09-05T03:22:28ZDave Vieglaisdave.vieglais@gmail.com
<p>This issue is to track the deployment of the UIC member node.</p>
<p>This MN is slated to be initially at least, a replication target member node.</p>