DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2020-08-06T00:06:07ZDataONE Tasks
Redmine CN REST - Bug #8867 (New): CNCore.listChecksumAlgorithms() returns incorrect listhttps://redmine.dataone.org/issues/88672020-08-06T00:06:07ZMatthew Jonesjones@nceas.ucsb.edu
<p>The definition of the ChecksumAlgorithm type in SystemMetadata allows any checksum algorithm listed in the Library of Congress vocab. But the current CNCore.listChecksumAlgorithms() implementation only returns two, MD5 and SHA-1. Need to correct this to include the full list of supported algorithms (see <a href="http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions.html">http://id.loc.gov/vocabulary/preservation/cryptographicHashFunctions.html</a>).</p>
<p>The implementation of this is in a property file, which needs to be updated with the correct list. The file (d1_cn_rest/src/test/resources/org/dataone/configuration/node.properties) currently contains:</p>
<p><code>cn.checksumAlgorithmList=SHA-1;MD5</code></p>
<p>But it should contain all of the other valid algorithms as well from the LoC.</p>
CN REST - Task #8809 (New): Adjust portal.properties for certificate configurationhttps://redmine.dataone.org/issues/88092019-05-21T12:57:41ZDave Vieglaisdave.vieglais@gmail.com
<p>Portal certificates are apparently currently configured in <code>/var/lib/tomcat7/webapps/portal/WEB-INF/portal.properties</code></p>
<p>This should be changed to <code>/etc/dataone/portal/portal.properties</code> to ensure persistence between .war deployments.</p>
CN REST - Task #8778 (New): Ensure SystemMetadata replica auditing updates are saved and broadcasthttps://redmine.dataone.org/issues/87782019-03-12T16:54:08ZChris Jonescjones@nceas.ucsb.edu
<p>In <code>MNAuditTask.call()</code>, we process a batch of pids in need of auditing per Member Node. For each <code>pid</code> in the <code>auditPIDs</code> list, we call <code>MN.getChecksum()</code>. Regardless of success or failure, we set the <code>Replica.replicaVerified</code> date in the <code>SystemMetadata</code>. However, the task has a copy of the system metadata from <code>hzSystemMetadata.get()</code>, and the task doesn't subsequently call <code>hzSystemMetadata.put(pid, sysmeta)</code>. This means that while we are auditing content, we may just not be recording the results! I need to look at the code more to see if we make an API call to <code>CN.updateSystemMetadata()</code> elsewhere, but I would expect the<code>MNAuditTask</code> to do this. Also, if this happens in the task, we also need to broadcast the system metadata change to the authoritative MN and all replica MNS. Lastly, we need to update the <code>serialVersion</code> field to show the other CNs what the most recent replica list is.</p>
CN REST - Task #8777 (New): Configure CN to audit objects greater than 1GBhttps://redmine.dataone.org/issues/87772019-03-12T16:47:42ZChris Jonescjones@nceas.ucsb.edu
<p>The replication auditor currently limits auditing of objects at 1GB. There are currently 4 objects greater than 1TB in size, and 3,588 objects greater than 1GB in size, both being very small counts compared to the 2,769,111 objects less than 1GB in size in the network. Nonetheless, they should still be audited if feasible. The limiting factor is likely HTTP timeout limits during the call to <code>MN.getChecksum()</code>. For reference, I'm seeing the following general times for calculating MD5 and SHA-1 checksums:</p>
<pre>Size MD5 SHA-1
---- ------- -------
1GB 00m02.5s 00m02.6s
10GB 00m25.9s 00m30.0s
100GB 03m28.0s 04m01.8s
1TB 50m14.2s 67m38.6s
</pre>
<p>10GB and 100GB objects seem pretty feasible if we set the HTTP client timeout to > 5 minutes, whereas the few > 1TB files may be challenging just due to the timeouts. The other factor is that the <code>AbstractReplicationAuditor</code> sets a default timeout to 60 seconds, and if the task future doesn't return in that time frame, the future gets cancelled. So the HTTP timeout and this timeout need to be increased and coordinated in order to handle larger object auditing.</p>
CN REST - Story #8756 (New): Ensure replica auditor is effectivehttps://redmine.dataone.org/issues/87562019-01-12T20:25:18ZChris Jonescjones@nceas.ucsb.edu
<p>The replication auditor service is currently configured to audit all objects every 90 days. As documented in <a class="issue tracker-4 status-1 priority-4 priority-default child" title="Story: Replica Auditing service is throwing errors (New)" href="https://redmine.dataone.org/issues/8582">#8582</a>, the auditor is not working correctly. While the errors being thrown that are described in that ticket seem to be limited to <code>pid</code>s with certain characters in them, I think the whole auditor process is not keeping up with our content.</p>
<p>Looking at the number of objects on each member node that haven't been audited in the last 90 days, auditing is well behind (if we consider it working at all):</p>
<pre>SELECT sm.authoritive_member_node, count(smr.guid) AS count
FROM systemmetadata sm INNER JOIN smreplicationstatus smr
ON sm.guid = smr.guid
WHERE
smr.member_node != 'urn:node:CN' AND
sm.date_uploaded < (SELECT CURRENT_DATE - interval '90 days') AND
smr.date_verified < (SELECT CURRENT_DATE - interval '90 days')
GROUP BY sm.authoritive_member_node
ORDER BY count DESC;
authoritive_member_node | count
-------------------------+--------
urn:node:ARCTIC | 771872
urn:node:PANGAEA | 507456
urn:node:LTER | 416339
urn:node:DRYAD | 374439
urn:node:CDL | 242115
urn:node:PISCO | 235791
urn:node:KNB | 86075
urn:node:TDAR | 75639
urn:node:NCEI | 50974
urn:node:USGS_SDC | 40290
urn:node:TERN | 31671
urn:node:ESS_DIVE | 28830
urn:node:NMEPSCOR | 16042
urn:node:GOA | 9266
urn:node:IARC | 7677
urn:node:NRDC | 6673
urn:node:TFRI | 6478
urn:node:PPBIO | 3464
urn:node:ORNLDAAC | 3328
urn:node:FEMC | 2430
urn:node:EDI | 2098
urn:node:GRIIDC | 2065
urn:node:mnTestKNB | 2010
urn:node:SANPARKS | 2008
urn:node:ONEShare | 1874
urn:node:R2R | 1787
urn:node:USGSCSAS | 1151
urn:node:EDACGSTORE | 1075
urn:node:US_MPC | 1032
urn:node:RW | 970
urn:node:KUBI | 516
urn:node:NEON | 487
urn:node:LTER_EUROPE | 343
urn:node:IOE | 279
urn:node:RGD | 273
urn:node:ESA | 272
urn:node:NKN | 218
urn:node:OTS_NDC | 126
urn:node:BCODMO | 115
urn:node:SEAD | 90
urn:node:mnTestNKN | 50
urn:node:EDORA | 28
urn:node:ONEShare.pem | 22
urn:node:CLOEBIRD | 17
urn:node:mnTestBCODMO | 11
urn:node:USANPN | 10
urn:node:mnTestTDAR | 10
urn:node:MyMemberNode | 1
</pre>
<p>The table above represents the number of un-audited objects (in the last 90 days), but I get the feeling that the auditor isn't able to audit any of the content it is charged to audit given 1) the frequency, 2) the number of threads allotted, and 3) the configured batch count (seems way too low). <del>Note that this query excludes replicated content - this is just the original objects</del> (After looking at my query again, I think the join is including all replicas - the total is 2,935,787, which is greater than the total objects in the system (2,751,136), so this query needs to be refined).</p>
<p>We need to evaluate the true effectiveness of the auditor. Some strategies may include: 1) looking to see if we may be in an infinite loop on processing a few <code>pid</code>s due to the issues in <a class="issue tracker-4 status-1 priority-4 priority-default child" title="Story: Replica Auditing service is throwing errors (New)" href="https://redmine.dataone.org/issues/8582">#8582</a>, 2) seeing if we can increase the batch size by increasing the total threads allocated in the executor, and 3) decide if we need to offload the process from the CNs and distribute the workload across a cluster of workers that can do the auditing faster. Needs some thought and discussion.</p>
CN REST - Bug #8740 (New): CN resolve service returning 404 for some pidshttps://redmine.dataone.org/issues/87402018-11-08T00:21:45ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>A user has reported that certain pids (all for metadata objects) return an http 404 status for the resolve service. Here is the complete list that the user tried that didn't resolve:</p>
<p><a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-arc.1343.2">https://cn.dataone.org/cn/v2/resolve/knb-lter-arc.1343.2</a><br>
<a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-arc.1212.2">https://cn.dataone.org/cn/v2/resolve/knb-lter-arc.1212.2</a><br>
<a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-arc.1477.2">https://cn.dataone.org/cn/v2/resolve/knb-lter-arc.1477.2</a><br>
<a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-bnz.442.8">https://cn.dataone.org/cn/v2/resolve/knb-lter-bnz.442.8</a><br>
<a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-pie.15.5">https://cn.dataone.org/cn/v2/resolve/knb-lter-pie.15.5</a><br>
<a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-pie.248.1">https://cn.dataone.org/cn/v2/resolve/knb-lter-pie.248.1</a><br>
<a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-sbc.6.13">https://cn.dataone.org/cn/v2/resolve/knb-lter-sbc.6.13</a><br>
<a href="https://cn.dataone.org/cn/v2/resolve/knb-lter-sbc.15.23">https://cn.dataone.org/cn/v2/resolve/knb-lter-sbc.15.23</a></p>
<p>The CNRead.getSystemMetadata() service returns metadata for each of these pids.</p>
CN REST - Bug #8698 (New): CN Performance degradationhttps://redmine.dataone.org/issues/86982018-09-13T21:05:40ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>Calling simple services on the CN can take a long time to execute. For example, calling </p>
<p><a href="https://cn.dataone.org/cn/v2/resolve/aekos.org.au/collection/nsw.gov.au/nsw_atlas/vis_flora_module/KM_CUDM.20160202">https://cn.dataone.org/cn/v2/resolve/aekos.org.au/collection/nsw.gov.au/nsw_atlas/vis_flora_module/KM_CUDM.20160202</a></p>
<p>can take about 8-10 seconds, while calling the MN directly (where the resolve service redirects to) takes about 1-2 seconds:</p>
<p><a href="https://dataone.tern.org.au/mn/v2/object/aekos.org.au%2Fcollection%2Fnsw.gov.au%2Fnsw_atlas%2Fvis_flora_module%2FKM_CUDM.20160202">https://dataone.tern.org.au/mn/v2/object/aekos.org.au%2Fcollection%2Fnsw.gov.au%2Fnsw_atlas%2Fvis_flora_module%2FKM_CUDM.20160202</a></p>
CN REST - Bug #8630 (New): equivalentIdentity values use uppercase letters when the same person s...https://redmine.dataone.org/issues/86302018-06-26T14:15:00ZLauren Walkerwalker@nceas.ucsb.edu
<p>I would expect a username to appear exactly the same everywhere in the identity service. </p>
<p>Here is an example where the <code>equivalentIdentity</code> uses uppercase letters for the subject (UID=corinalogan,O=unaffiliated,DC=ecoinformatics,DC=org) when the <code>person</code>><code>subject</code> uses lowercase letters (uid=corinalogan,o=unaffiliated,dc=ecoinformatics,dc=org):</p>
<p><a href="https://cn.dataone.org/cn/v2/accounts/http%3A%2F%2Forcid.org%2F0000-0002-5944-906X">https://cn.dataone.org/cn/v2/accounts/http%3A%2F%2Forcid.org%2F0000-0002-5944-906X</a></p>
<pre><ns2:subjectInfo xmlns:ns2="http://ns.dataone.org/service/types/v1">
<person>
<subject>http://orcid.org/0000-0002-5944-906X</subject>
<givenName>Corina</givenName>
<familyName>Logan</familyName>
<equivalentIdentity>
UID=corinalogan,O=unaffiliated,DC=ecoinformatics,DC=org
</equivalentIdentity>
<verified>false</verified>
</person>
<person>
<subject>
uid=corinalogan,o=unaffiliated,dc=ecoinformatics,dc=org
</subject>
<givenName>Corina</givenName>
<familyName>Logan</familyName>
<equivalentIdentity>http://orcid.org/0000-0002-5944-906X</equivalentIdentity>
<verified>false</verified>
</person>
</ns2:subjectInfo>
```~~~
</pre> CN REST - Story #8364 (In Progress): Ensure portal uses correct X509 certificateshttps://redmine.dataone.org/issues/83642018-02-13T20:17:25ZChris Jonescjones@nceas.ucsb.edu
<p>We've run into issues where after an upgrade of the <code>dataone-cn-portal</code> package on the CNs, the properties pointing to the public certificate and private key are incorrectly pointing to the old GeoTrust wildcard files rather than the new Lets Encrypt files:<br>
<br>
cn.server.publiccert.filename=/etc/ssl/certs/<em>.test.dataone.org.crt<br>
cn.server.privatekey.filename=/etc/ssl/private/</em>.test.dataone.org.key</p>
<p>These should be (in STAGE):</p>
<p>/etc/letsencrypt/live/cn-stage.test.dataone.org/cert.pem<br>
/etc/letsencrypt/live/cn-stage.test.dataone.org/privkey.pem</p>
<p>The issue might be that these are not being set correctly during the <code>postinst</code> script run. Jing pointed out that these values are taken from the debconf database settings that get set when <code>dataon-cn-os-core</code> is installed. So although the <code>postinst</code> script might be setting the correct values, the old cached values might still be in memory in the debconf database. If so, we'll need to clear those values during installations and upgrades.</p>
<p>Also, knowing where to look for these configuration settings can be challenging. These are referenced from <code>/var/lib/tomcat7/webapps/portal/WEB-INF/portal.properties</code>. These settings should be consolidated into <code>/etc/dataone/portal/portal.properties</code> so they also don't get blown away on war file upgrades in Tomcat.</p>
CN REST - Bug #8010 (New): CN.archive fails with 401 Unauthorized when using either MN or CN clie...https://redmine.dataone.org/issues/80102017-02-01T20:04:46ZMark Servillamark.servilla@gmail.com
<p>CN.archive fails with 401 Unauthorized when using either MN or CN client certificate for PID with authoritative MN as urn:node:LTER - </p>
<p>MN attempt:<br>
<br>
curl -i -E ./urn_node_LTER-1.pem -X PUT <a href="https://cn.dataone.org/cn/v2/archive/doi:10.6073/AA/knb-lter-bes.437.35">https://cn.dataone.org/cn/v2/archive/doi:10.6073/AA/knb-lter-bes.437.35</a><br>
HTTP/1.1 401 Unauthorized<br>
Date: Wed, 01 Feb 2017 19:23:20 GMT<br>
Server: Apache/2.4.7 (Ubuntu)<br>
Content-Type: text/xml<br>
Content-Length: 291<br>
Access-Control-Allow-Origin: <br>
Access-Control-Allow-Credentials: true<br>
Access-Control-Allow-Headers: Authorization, Content-Type, Location, Content-Length, x-annotator-auth-token<br>
Access-Control-Expose-Headers: Content-Length, Content-Type, Location<br>
Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE<br>
<?xml version="1.0" encoding="UTF-8"?><br>
The Coordinating Node is not authorized to make systemMetadata changes on this object. Please make changes directly on the authoritative Member Node.<br>
</p>
<p>CN attempt:<br>
<br>
curl -i -E ./urn_node_CNUCSB1.pem -X PUT <a href="https://cn.dataone.org/cn/v2/archive/doi:10.6073/AA/knb-lter-bes.437.35">https://cn.dataone.org/cn/v2/archive/doi:10.6073/AA/knb-lter-bes.437.35</a><br>
HTTP/1.1 401 Unauthorized<br>
Date: Wed, 01 Feb 2017 19:29:29 GMT<br>
Server: Apache/2.4.7 (Ubuntu)<br>
Content-Type: text/xml<br>
Content-Length: 291<br>
Access-Control-Allow-Origin: <br>
Access-Control-Allow-Credentials: true<br>
Access-Control-Allow-Headers: Authorization, Content-Type, Location, Content-Length, x-annotator-auth-token<br>
Access-Control-Expose-Headers: Content-Length, Content-Type, Location<br>
Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE<br>
<?xml version="1.0" encoding="UTF-8"?><br>
The Coordinating Node is not authorized to make systemMetadata changes on this object. Please make changes directly on the authoritative Member Node.<br>
</p>
<p>Object System Metadata:<br>
<br>
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><br>
<br>
3<br>
doi:10.6073/AA/knb-lter-bes.437.35<br>
eml://ecoinformatics.org/eml-2.0.1<br>
10881<br>
ecfee77e0d5297ddb74a796fb4c36f02<br>
uid=BES,o=LTER,dc=ecoinformatics,dc=org<br>
uid=BES,o=LTER,dc=ecoinformatics,dc=org<br>
<br>
<br>
uid="BES",o=lter,dc=ecoinformatics,dc=org<br>
read<br>
write<br>
changePermission<br>
<br>
<br>
public<br>
read<br>
<br>
<br>
<br>
doi:10.6073/AA/knb-lter-bes.437.34<br>
doi:10.6073/AA/knb-lter-bes.437.36<br>
false<br>
2010-01-08T00:00:00.000+00:00<br>
2015-08-14T21:03:34.343+00:00<br>
urn:node:LTER<br>
urn:node:LTER<br>
<br>
urn:node:CN<br>
completed<br>
2015-08-14T21:03:31.858+00:00<br>
<br>
<br>
urn:node:LTER<br>
completed<br>
2015-12-25T09:37:20.314+00:00<br>
<br>
<a href="/ns3:systemMetadata">/ns3:systemMetadata</a></p>
CN REST - Task #7750 (New): apply business rules on the CN that Subject strings will be stripped ...https://redmine.dataone.org/issues/77502016-04-26T17:30:35ZRob Nahfrnahf@epscor.unm.edu
<p>make sure that the business rule is documented</p>
<p>apply to cn_identity_manager and other places in the CN service layer.</p>
CN REST - Task #7571 (New): Description for error code 401, detail code 4957 is misleading in som...https://redmine.dataone.org/issues/75712016-01-05T16:25:01ZPeter Slaughterslaughter@nceas.ucsb.edu
<p>When using CNDiagnostic.echoCredentials() with an expired authentication token, the returned message is:</p>
<p><?xml version="1.0" encoding="UTF-8"?><br>
<br>
No credentials were received in the request. (Session was null)</p>
<p>The description text in this case is misleading, because from the caller's point of view, credentials<br>
were provided, albeit invalid. Is it possible to determine the validity of the credentials, i.e. expiration<br>
status, and indicate that in the description?</p>
<p>BTW, this same token did work correctly before it expired.</p>
<p>A bash script is attached that recreates the error.</p>
CN REST - Bug #7301 (New): CN-stage allows connections to a MN that is operating a self-signed SS...https://redmine.dataone.org/issues/73012015-08-18T18:12:14ZMark Servillamark.servilla@gmail.com
<p>CN-stage supports connections to a MN that is operating a self-signed SSL server certificate - this should not be allowed since the connection could occur with a rogue non-verified server.</p>
<p>This instance occurred with dataone-dev.ecoinformatics.org.au:443 on 18 August 2015:</p>
<p>Certificate chain<br>
0 s:/CN=dataone-dev.ecoinformatics.org.au<br>
i:/CN=dataone-dev.ecoinformatics.org.au</p>
<p>Issuer: CN=dataone-dev.ecoinformatics.org.au<br>
Validity<br>
Not Before: Aug 11 04:56:19 2015 GMT<br>
Not After : Aug 8 04:56:19 2025 GMT<br>
Subject: CN=dataone-dev.ecoinformatics.org.au</p>
CN REST - Task #2168 (New): Design UI for identity validationhttps://redmine.dataone.org/issues/21682012-01-06T03:18:46ZDave Vieglaisdave.vieglais@gmail.comCN REST - Task #1479 (In Progress): Design web UI for validating newly created identitieshttps://redmine.dataone.org/issues/14792011-04-06T18:01:34ZMatthew Jonesjones@nceas.ucsb.edu