DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2020-06-17T21:49:55ZDataONE Tasks
Redmine CN REST - Story #8864 (New): Sychronization does not register authoritative replica entry correctlyhttps://redmine.dataone.org/issues/88642020-06-17T21:49:55ZChris Jonescjones@nceas.ucsb.edu
<p>When objects are synchronized to the CN, the <code>d1_synchronization</code> component will fetch the system metadata <br>
for each object and will add a <code><replica></code> entry for the origin node (like <code>urn:node:ESS_DIVE</code>, <br>
as well as entries for other copies (for instance for science metadata copied to the CN, <br>
a <code><replica>urn:node:CN</replica></code> will be added.</p>
<p>In some instances, the origin replica instance is not added to the replica list.<br><br>
This causes downstream problems for the <code>d1_replication</code> component because it relies on the origin node <br>
replica entry to be present in order to set up a replication request to a target node. I'm seeing errors like:</p>
<pre>/var/log/dataone/replicate/cn-replication.log.90:[ERROR] 2020-06-04 05:18:30,179 [pool-15-thread-1] (MNCommunication:requestReplication:34) Could not determine replication source node for replication request for pid: ess-dive-eb6cbb22c605506-20200122T170607966. Replication request failed.
</pre>
<p>Looking back in the logs, this is the case for the following objects:</p>
<pre>ess-dive-3947e68e9825233-20180621T213650539
ess-dive-3b8d9f4513e02f9-20180621T214221437
ess-dive-467a6c3dda4dc88-20180621T211148554
ess-dive-51f345daca126f7-20180328T160350610716
ess-dive-53b37ae5d8c0f51-20200219T211634419654
ess-dive-6b688fab5524c46-20200121T210154766
ess-dive-7a31346c154f02b-20200127T155012488
ess-dive-a1fb05cbd903309-20200130T190835651
ess-dive-b420b097851c716-20180523T161714606
ess-dive-ba81a8a8e0bef31-20180727T200828345
ess-dive-bfaf3d6d6fd038c-20180716T154005175903
ess-dive-c2ef5f3af108c9c-20180621T220020545
ess-dive-eb6cbb22c605506-20200122T170607966
ess-dive-f3238db16593de5-20180621T215956950
</pre>
<p>We need to fix this issue in <code>d1_synchronization</code> so replication runs correctly and monthly <br>
replica auditing (done by ESS_DIVE) doesn't flag these issues.</p>
CN REST - Bug #8860 (New): /token endpoint doesn't set a content-type and character encodinghttps://redmine.dataone.org/issues/88602020-02-29T01:00:11ZBryce Mecummecum@nceas.ucsb.edu
<p>On Firefox only, requests to the /portal/token endpoint (i.e., the one MetacatUI and other clients use to fetch their auth tokens, like <a href="https://cn.dataone.org/portal/token">https://cn.dataone.org/portal/token</a>) result in errors in the browser console.</p>
<p>When you access the URL via an XHR request, you see:</p>
<blockquote>
<p>XML Parsing Error: syntax error<br>
Location: <a href="https://cn-stage.test.dataone.org/portal/token">https://cn-stage.test.dataone.org/portal/token</a><br>
Line Number 1, Column 1:</p>
</blockquote>
<p>When you access the URL directly in Firefox:</p>
<blockquote>
<p>The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.</p>
</blockquote>
<p>I had a hunch that this error would go away if the response simply had the <code>Content-Type</code> header set to <code>text/plain; charset=utf-8</code> so I spun up <code>mitmproxy</code>, made that edit to the intercepted response, and saw that the error does go away.</p>
<p>I think we should modify the portal code to set the <code>Content-Type</code> header like above so the error goes away.</p>
CN REST - Task #8810 (New): Verify configuration of portal certificateshttps://redmine.dataone.org/issues/88102019-05-21T13:00:12ZDave Vieglaisdave.vieglais@gmail.com
<p>Verify that the postinst scripts for dataone-cn-os-core and dataone-cn-portal are correctly setting the locations of the certificates for token signing.</p>
CN REST - Task #8776 (New): Set valid replica status to completedhttps://redmine.dataone.org/issues/87762019-03-12T15:57:36ZChris Jonescjones@nceas.ucsb.edu
<p>In <code>MNAuditTask.call()</code> we audit replica checksums, but on success, we only set the <code>Replica.replicaVerified</code> field to the current date. We don't set the <code>Replica.replicationStatus</code> field to <code>COMPLETED</code>. This is an issue because the <code>Replica</code> entry in the <code>SystemMetadata</code> may have been set to <code>FAILED</code> or <code>INVALIDATED</code>, but may now be valid, and so would need to be updated.</p>
CN REST - Story #8582 (New): Replica Auditing service is throwing errorshttps://redmine.dataone.org/issues/85822018-05-01T19:15:35ZChris Jonescjones@nceas.ucsb.edu
<p>Replica auditing should be auditing objects every 90 days for fixity, and setting the <code>replicaStatus</code> appropriately. The <code>/var/log/dataone/cn-replication-audit.log*</code> files are showing many errors:</p>
<pre>cjones@cn-ucsb-1:replicate$ grep ERROR cn-replication-audit.log* | grep "Cannot update replica status" | wc -l
437601
</pre>
<p>Determine if this is a configuration issue or a code issue and fix it as needed. Also, fix the code to call <code>Identifier.getValue()</code> when logging these errors to avoid printing the memory location of the object like <code>org.dataone.service.types.v1.Identifier@7e90f2e8</code>. There are multiple places where <code>getValue()</code> needs to be added.</p>
CN REST - Bug #7918 (New): SEAD object only partially synchronized - missing autogen.201609291601...https://redmine.dataone.org/issues/79182016-10-21T17:20:29ZDave Vieglaisdave.vieglais@gmail.com
<p>While updating <a class="issue tracker-1 status-5 priority-4 priority-default closed" title="Bug: SEAD object only partially synchronized - missing autogen.2015061616000265251 document from /var/... (Closed)" href="https://redmine.dataone.org/issues/7222">#7222</a> and <a class="issue tracker-5 status-5 priority-4 priority-default closed" title="Task: SEAD object only partially synchronized in CN production - missing autogen.2016092916013111425 do... (Closed)" href="https://redmine.dataone.org/issues/7914">#7914</a>. Checked for content for those, and it is accessible through CN. May still be an issue however as the SEAD MN reports 69 objects, though<br>
there are 68 reported in the search index.</p>
<p>Counting on the MN:</p>
<p>curl "<a href="http://seadva.d2i.indiana.edu:8081/sead/rest/mn/v1/object">http://seadva.d2i.indiana.edu:8081/sead/rest/mn/v1/object</a>" | xml fo</p>
<p>returns 69 objects.</p>
<p>Counting on the CN::</p>
<p>curl "<a href="https://cn.dataone.org/cn/v2/object?nodeId=urn:node:SEAD">https://cn.dataone.org/cn/v2/object?nodeId=urn:node:SEAD</a>" | xml fo</p>
<p>returns 69 objects.</p>
<p>Counting on the CN using the search index::</p>
<p><a href="https://cn.dataone.org/cn/v1/query/solr/?start=0&rows=10&fl=id%2Ctitle%2CformatId&q=datasource%3A%22urn%5C%3Anode%5C%3ASEAD%22">https://cn.dataone.org/cn/v1/query/solr/?start=0&rows=10&fl=id%2Ctitle%2CformatId&q=datasource%3A%22urn%5C%3Anode%5C%3ASEAD%22</a></p>
<p>returns 68 objects.</p>
<p>List of identifiers in search index::</p>
<p>curl "<a href="https://cn.dataone.org/cn/v1/query/solr/?start=0&rows=100&fl=id&q=datasource%3A%22urn%5C%3Anode%5C%3ASEAD%22">https://cn.dataone.org/cn/v1/query/solr/?start=0&rows=100&fl=id&q=datasource%3A%22urn%5C%3Anode%5C%3ASEAD%22</a>" | xml sel -t -m "//doc/str[@name='id']" -v . -n | sort > SEAD_index_pids.txt</p>
<p>List of identifiers on CN::</p>
<p>curl "<a href="https://cn.dataone.org/cn/v2/object?nodeId=urn:node:SEAD">https://cn.dataone.org/cn/v2/object?nodeId=urn:node:SEAD</a>" | xml sel -t -m "//objectInfo/identifier" -v . -n | sort > SEAD_cn_pids.txt</p>
<p>Missing PID::</p>
<p>diff SEAD_cn_pids.txt SEAD_index_pids.txt<br>
60d59<br>
< seadva-c918e4ff-2861-496a-a907-d2cb382ddb30</p>
<p>System Metadata for PID::</p>
<p><?xml version="1.0"?><br>
<br>
1<br>
seadva-c918e4ff-2861-496a-a907-d2cb382ddb30<br>
FGDC-STD-001-1998<br>
6553<br>
ff3d4641669b4e08c9f8b978b85d6113a4c1bab8<br>
SEAD<br>
CN=urn:node:SEAD,DC=dataone,DC=org<br>
<br>
<br>
public<br>
read<br>
<br>
<br>
<br>
sead-Martin-John-f1dbc3df-c27c-4647-b05a-4b1f05c99a24<br>
2016-08-17T12:41:34.03Z<br>
2016-08-17T14:43:58.677Z<br>
urn:node:SEAD<br>
urn:node:SEAD<br>
<br>
urn:node:SEAD<br>
completed<br>
2016-09-29T23:01:21.235Z<br>
<br>
<br>
urn:node:CN<br>
completed<br>
2016-09-29T23:01:21.241Z<br>
<br>
<a href="/ns1:systemMetadata">/ns1:systemMetadata</a></p>
<p>Confirm object can be retieved::</p>
<p>curl "<a href="http://seadva.d2i.indiana.edu:8081/sead/rest/mn/v1/object/seadva-c918e4ff-2861-496a-a907-d2cb382ddb30">http://seadva.d2i.indiana.edu:8081/sead/rest/mn/v1/object/seadva-c918e4ff-2861-496a-a907-d2cb382ddb30</a>" | xml fo</p>
<p>System Metadata for obsoleted object::</p>
<p><?xml version="1.0"?><br>
<br>
1<br>
sead-Martin-John-f1dbc3df-c27c-4647-b05a-4b1f05c99a24<br>
FGDC-STD-001-1998<br>
7443<br>
ebd481ad8268f9714f23d772c15fb62e61384486<br>
CN=urn:node:SEAD,DC=dataone,DC=org<br>
CN=urn:node:SEAD,DC=dataone,DC=org<br>
<br>
<br>
public<br>
read<br>
<br>
<br>
<br>
seadva-c918e4ff-2861-496a-a907-d2cb382ddb30<br>
2013-10-24T18:41:31.213Z<br>
2016-08-17T14:43:58.677Z<br>
urn:node:SEAD<br>
urn:node:SEAD<br>
<br>
urn:node:CN<br>
completed<br>
2013-10-24T23:00:04.684Z<br>
<br>
<br>
urn:node:SEAD<br>
completed<br>
2013-10-24T23:00:04.579Z<br>
<br>
<a href="/ns1:systemMetadata">/ns1:systemMetadata</a><br></p>
<p>autogenid for <code>seadva-c918e4ff-2861-496a-a907-d2cb382ddb30</code>::</p>
<p>psql metacat<br>
select * from identifier where guid='seadva-c918e4ff-2861-496a-a907-d2cb382ddb30';</p>
<table><thead>
<tr>
<th>guid</th>
<th>docid</th>
<th>rev</th>
</tr>
</thead><tbody>
<tr>
<td>seadva-c918e4ff-2861-496a-a907-d2cb382ddb30</td>
<td>autogen.2016092916012224122</td>
<td>1</td>
</tr>
</tbody></table>
<p>$ ls /var/metacat/data/autogen.2016092916012224122*<br>
ls: cannot access /var/metacat/data/autogen.2016092916012224122: No such file or directory</p>
CN REST - Task #7911 (New): Synchronization allows invalid checksums, preventing corrected synchttps://redmine.dataone.org/issues/79112016-10-17T15:16:07ZChris Jonescjones@nceas.ucsb.edu
<p>Normally, d1_synchronization does checksum validation of objects before registering them in the CN. However, a CHECKSUM_VERIFICATION_SIZE_BYPASS_THRESHOLD flag was introduced into TransferObjectTask that defaults to 10MB. If an object size is greater than this threshold, the checksum won't be verified. In the cn-buildout, this default is not changed in the properties file, but it can be.</p>
<p>As a result, objects below this threshold will throw an exception during sync if the checksum is incorrect, whereas those above the threshold will successfully sync with incorrect system metadata. This becomes a problem later when trying to update the system metadata with the correct checksum because this field is immutable. For example, in the STAGE environment, the following object failed to process the system metadata update:</p>
<p>[ERROR] 2016-10-17 14:54:38,112 (V2TransferObjectTask:call:269) Task-urn:node:mnTestARCTIC-urn:uuid:a3bfef74-f6e9-4ecc-871e-0a3ea764b471 - UnrecoverableException: Failed to update cn with new valid SystemMetadata! - InvalidRequest - The request is trying to modify an immutable field in the SystemMeta: the new system meta's checksum dee03804421bac149371877d2d366abb7c941fba is different to the orginal one bef6df568ed1c713a8323434694319894f25a8b9dfa704f7fe2b7d52592b2b40</p>
<p>Since MN.getChecksum() is normally being called to do the heavy lifting of calculating the actual checksum, I'm not sure why this flag was introduced. Even for muti-gigabyte files, the checksum calculation is pretty quick. To prevent the CN from ingesting incorrect system metadata, I'd suggest we consider removing this threshold, or at a minimum, set the property value to be multi-terabyte. Also, this is a case where the MN is authoritative for the system metadata, but the CN update fails because of the immutable status of the checksum. Ultimately, we shouldn't be sync'ing content with invalid checksums, which allows for the MN operator to correct the checksum and then retry the sync. Needs discussion.</p>
CN REST - Task #7903 (New): Need to implement/support the default http methods - HEAD and GET ...https://redmine.dataone.org/issues/79032016-10-07T23:59:15ZMatthew Jonesjones@nceas.ucsb.edu
<p>Developers on the Whole Tale project at NCSA reported a bug in the HTTP HEAD request for our resolve service URIs. Example output below to reproduce the error. </p>
<p>Expected: a status code of 200</p>
<p>xarth@shakuras ~ $ curl --head <a href="https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Ae9ff8bfe-f12d-4630-a6f1-f3eab740be6f">https://cn.dataone.org/cn/v2/resolve/urn%3Auuid%3Ae9ff8bfe-f12d-4630-a6f1-f3eab740be6f</a><br>
HTTP/1.1 500 Internal Server Error<br>
Date: Fri, 07 Oct 2016 23:00:14 GMT<br>
Server: Apache/2.2.22 (Ubuntu)<br>
Content-Length: 260<br>
Access-Control-Allow-Origin: <br>
Access-Control-Allow-Credentials: true<br>
Access-Control-Allow-Headers: Authorization, Content-Type, Location, Content-Length, x-annotator-auth-token<br>
Access-Control-Expose-Headers: Content-Length, Content-Type, Location<br>
Access-Control-Allow-Methods: POST, GET, OPTIONS, PUT, DELETE<br>
Vary: Accept-Encoding<br>
Connection: close<br>
Content-Type: text/xml;charset=UTF-8</p>
<p>xarth@shakuras ~ $ curl --head <a href="https://knb.ecoinformatics.org/knb/d1/mn/v2/object/urn%3Auuid%3Ae9ff8bfe-f12d-4630-a6f1-f3eab740be6f">https://knb.ecoinformatics.org/knb/d1/mn/v2/object/urn%3Auuid%3Ae9ff8bfe-f12d-4630-a6f1-f3eab740be6f</a><br>
HTTP/1.1 200 OK<br>
Date: Fri, 07 Oct 2016 23:00:48 GMT<br>
Server: Apache/2.4.7 (Ubuntu)<br>
Set-Cookie: JSESSIONID=7DC18368F71D5D9948371B3C33437E8B; Path=/knb/; Secure<br>
DataONE-Checksum: SHA-1,927a11b6e46b771c9922083814f6ee8e5b09f696<br>
Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT<br>
DataONE-ObjectFormat: application/octet-stream<br>
DataONE-SerialVersion: 0<br>
Content-Length: 2809655736<br>
Access-Control-Allow-Origin: <br>
Access-Control-Allow-Credentials: true<br>
Content-Type: text/xml</p>
CN REST - Bug #7746 (In Progress): Node registration update fails when <contactSubject> spans mul...https://redmine.dataone.org/issues/77462016-04-19T23:55:09ZMark Servillamark.servilla@gmail.com
<p>Node registration update fails when the string in the node registration document (attached) is on a separate line from the XML element tags. Apparently, parsing this field results in a subject string that does not match the LDAP record correctly, and the result is 401 not authorized exception:</p>
<p>CN=Lisa Stillwell A15851,O=University of North Carolina at Chapel Hill,C=US,DC=cilogon,DC=org<br>
</p>
<p>results in:<br>
<br>
<?xml version="1.0" encoding="UTF-8"?><br>
<br>
<br>
CN=Lisa Stillwell A15851,O=University of North Carolina at Chapel Hill,C=US,DC=cilogon,DC=org<br>
is not a Registered Subject<br>
</p>
<p>Correcting to:<br>
<br>
CN=Lisa Stillwell A15851,O=University of North Carolina at Chapel Hill,C=US,DC=cilogon,DC=org</p>
<p>succeeds.</p>
CN REST - Bug #7687 (New): Synchronization unable to change BaseURLs upon updateNodeCapabilities ...https://redmine.dataone.org/issues/76872016-03-23T18:34:41ZRobert Waltz
<p>Synchronization does not change the baseurl of a membernode after the membernode has called updateNodeCapabilities in order to update its baseurl.</p>
<p>Synchronization keeps a pool of d1clients associated with a membernode Id. </p>
<p>NodeCommObjectListHarvestFactory and NodeCommSyncObjectFactory will each need to be notified if a membernode has changed its baseurl node. Those objects can then purge and re-initialize the d1clients responsible for maintaining communication to the membernode.</p>
CN REST - Bug #7489 (New): processing daemon common-logging misconfiguredhttps://redmine.dataone.org/issues/74892015-11-16T16:32:08ZRobert Waltz
<p>the commons-logging.properties file appears misconfigured. It is missing the line:</p>
<p>org.apache.commons.logging.Log=org.apache.commons.logging.impl.Log4JLogger</p>
<p>Hopefully, adding this line will allow all the apache commons logging calls to go to the log4j loggers.</p>
CN REST - Bug #7440 (New): Non-discernable error during synchronization affecting (mostly) urn:no...https://redmine.dataone.org/issues/74402015-10-16T17:40:44ZMark Servillamark.servilla@gmail.com
<p>Multiple (366) non-discernable errors (see attachment) with message only stating "Cline is shutdown" occurred on 14 Oct 2015 on cn-stage.test.dataone.org (cn-stage-ucsb-1); sampling of objects/system metadata indicate all is retrievable from the MN <a href="https://dataone-dev.ecoinformatics.org.au/mn">https://dataone-dev.ecoinformatics.org.au/mn</a>. This error had also occurred (24 events) for urn:node:mnTestLTER.</p>
<p>For example:</p>
<p>cn-synchronization.log.1-[ERROR] 2015-10-14 11:33:02,749 (TransferObjectTask:write:606) Task-urn:node:mnTestAEKOS-aekos.org.au/collection/nsw.gov.au/nsw_atlas/vis_flora_module/ABERBALDIE.20150515<br>
cn-synchronization.log.1:Client is shutdown.</p>
CN REST - Bug #7161 (New): TERN object fails to be indexed by Solr, but successfully synchronizedhttps://redmine.dataone.org/issues/71612015-06-05T18:02:10ZMark Servillamark.servilla@gmail.com
<p>TERN object aekos.org.au/collection/nsw.gov.au/nsw_atlas/vis_flora_module/V_ILLAWDB3.20150515 fails to by indexed by Solr on cn.dataone.edu (cn-ucsb-1) even though it was successfully synchronized. Investigation indicates it was not added to the HazelCast ObjectPathMap structure according to Skye Roseboom:</p>
<p>Im not sure why this pid is not appearing in the hazelcast ObjectPathMap structure.</p>
<p>I looked into the metacat database schema a bit and noticed the pid does not seem to appear in the ‘identifier_mapping’ table:</p>
<p>select * from identifier_mapping where guid='aekos.org.au/collection/nsw.gov.au/nsw_atlas/vis_flora_module/V_ILLAWDB3.20150515';</p>
<table><thead>
<tr>
<th>guid</th>
<th>docid</th>
<th>rev</th>
</tr>
</thead><tbody>
</tbody></table>
<p>(0 rows)</p>
<p>Without the ‘docid’ or ‘localid’ as metacat calls them, Im don’t thing the pid could be added to the objectPathMap in hazelcast.</p>
CN REST - Task #7096 (New): Unexpectedly closed streams / disconnects on UNM networkhttps://redmine.dataone.org/issues/70962015-05-12T17:49:44ZAndrei Buiumandreib@epscor.unm.edu
<p>When testing, making any CN or MN API call would occasionally yield an exception (randomly but very often) :</p>
<p>Could not resolve multipart files: Processing of multipart/form-data request failed. Stream ended unexpectedly</p>
<p>I was making CN.create() calls from UNM to the UCSB Dev CN. <br>
(It also happened for MN.create() calls from UNM to mnDemo6 when it was up. Not sure where mnDemo6 is physically located though.)<br>
This seems to happen when the connection is terminated while metacat is reading the object from the multipart files.</p>
CN REST - Task #1412 (New): MN health check performed by CNhttps://redmine.dataone.org/issues/14122011-03-08T17:04:39ZRoger Dahldahl@unm.edu
<ul>
<li>How will CN represent, in the NodeList, the results of what it finds by doing health checks on MN endpoints?</li>
<li>How will MN represent what it believes to be the state of its endpoints?</li>
</ul>