DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-01-14T16:46:33ZDataONE Tasks
Redmine CN REST - Story #8757 (New): Fix getChecksum() in MNAuditTask to use dynamic checksum algorithmshttps://redmine.dataone.org/issues/87572019-01-14T16:46:33ZChris Jonescjones@nceas.ucsb.edu
<p>The <code>MNAuditTask.call()</code> method is hardcoded to use <code>MD5</code> checksums on line 277. It requests the Member Node to generate an <code>MD5</code> checksum, and then compares that checksum to the checksum stated in the Coordinating Node<code>s cached</code>SystemMetadata.checksum<code>field for the object. This obviously will fail for objects that submitted objects using</code>SHA-1` or other algorithms.</p>
CN REST - Story #8756 (New): Ensure replica auditor is effectivehttps://redmine.dataone.org/issues/87562019-01-12T20:25:18ZChris Jonescjones@nceas.ucsb.edu
<p>The replication auditor service is currently configured to audit all objects every 90 days. As documented in <a class="issue tracker-4 status-1 priority-4 priority-default child" title="Story: Replica Auditing service is throwing errors (New)" href="https://redmine.dataone.org/issues/8582">#8582</a>, the auditor is not working correctly. While the errors being thrown that are described in that ticket seem to be limited to <code>pid</code>s with certain characters in them, I think the whole auditor process is not keeping up with our content.</p>
<p>Looking at the number of objects on each member node that haven't been audited in the last 90 days, auditing is well behind (if we consider it working at all):</p>
<pre>SELECT sm.authoritive_member_node, count(smr.guid) AS count
FROM systemmetadata sm INNER JOIN smreplicationstatus smr
ON sm.guid = smr.guid
WHERE
smr.member_node != 'urn:node:CN' AND
sm.date_uploaded < (SELECT CURRENT_DATE - interval '90 days') AND
smr.date_verified < (SELECT CURRENT_DATE - interval '90 days')
GROUP BY sm.authoritive_member_node
ORDER BY count DESC;
authoritive_member_node | count
-------------------------+--------
urn:node:ARCTIC | 771872
urn:node:PANGAEA | 507456
urn:node:LTER | 416339
urn:node:DRYAD | 374439
urn:node:CDL | 242115
urn:node:PISCO | 235791
urn:node:KNB | 86075
urn:node:TDAR | 75639
urn:node:NCEI | 50974
urn:node:USGS_SDC | 40290
urn:node:TERN | 31671
urn:node:ESS_DIVE | 28830
urn:node:NMEPSCOR | 16042
urn:node:GOA | 9266
urn:node:IARC | 7677
urn:node:NRDC | 6673
urn:node:TFRI | 6478
urn:node:PPBIO | 3464
urn:node:ORNLDAAC | 3328
urn:node:FEMC | 2430
urn:node:EDI | 2098
urn:node:GRIIDC | 2065
urn:node:mnTestKNB | 2010
urn:node:SANPARKS | 2008
urn:node:ONEShare | 1874
urn:node:R2R | 1787
urn:node:USGSCSAS | 1151
urn:node:EDACGSTORE | 1075
urn:node:US_MPC | 1032
urn:node:RW | 970
urn:node:KUBI | 516
urn:node:NEON | 487
urn:node:LTER_EUROPE | 343
urn:node:IOE | 279
urn:node:RGD | 273
urn:node:ESA | 272
urn:node:NKN | 218
urn:node:OTS_NDC | 126
urn:node:BCODMO | 115
urn:node:SEAD | 90
urn:node:mnTestNKN | 50
urn:node:EDORA | 28
urn:node:ONEShare.pem | 22
urn:node:CLOEBIRD | 17
urn:node:mnTestBCODMO | 11
urn:node:USANPN | 10
urn:node:mnTestTDAR | 10
urn:node:MyMemberNode | 1
</pre>
<p>The table above represents the number of un-audited objects (in the last 90 days), but I get the feeling that the auditor isn't able to audit any of the content it is charged to audit given 1) the frequency, 2) the number of threads allotted, and 3) the configured batch count (seems way too low). <del>Note that this query excludes replicated content - this is just the original objects</del> (After looking at my query again, I think the join is including all replicas - the total is 2,935,787, which is greater than the total objects in the system (2,751,136), so this query needs to be refined).</p>
<p>We need to evaluate the true effectiveness of the auditor. Some strategies may include: 1) looking to see if we may be in an infinite loop on processing a few <code>pid</code>s due to the issues in <a class="issue tracker-4 status-1 priority-4 priority-default child" title="Story: Replica Auditing service is throwing errors (New)" href="https://redmine.dataone.org/issues/8582">#8582</a>, 2) seeing if we can increase the batch size by increasing the total threads allocated in the executor, and 3) decide if we need to offload the process from the CNs and distribute the workload across a cluster of workers that can do the auditing faster. Needs some thought and discussion.</p>
Infrastructure - Story #8639 (New): Replication performance is too slow to service demandhttps://redmine.dataone.org/issues/86392018-07-04T11:17:55ZDave Vieglaisdave.vieglais@gmail.com
<p>The replication process is operating too slowly to service demand resulting in lengthy delays to completion of replication tasks for new and changed content.</p>
<p>This is particularly apparent in the stage environment where perhaps the number of orphaned objects and deprecated / defunct nodes is interfering with expected behaviors.</p>
<p>Goal of this story is to identify and address the immediate issues. Any significant refactoring of the replication process should be captured under another story / epic.</p>
Infrastructure - Task #7466 (In Progress): Some objects not accessible on the CN via REST APIhttps://redmine.dataone.org/issues/74662015-11-04T18:41:38ZBryce Mecummecum@nceas.ucsb.edu
<p>While doing other work, I noticed that a good number (not sure how many) of objects listed on the CN's Solr index are not accessible via the REST API get() and resolve() methods. Instead of returning the object, they return a NotFound error. </p>
<p>To reproduce,</p>
<ol>
<li>Visit <a href="https://cn.dataone.org/cn/v1/query/solr/?fl=identifier,title,authoritativeMN,datasource&q=formatType:METADATA+AND+-obsoletedBy:*&rows=100&start=0">https://cn.dataone.org/cn/v1/query/solr/?fl=identifier,title,authoritativeMN,datasource&q=formatType:METADATA+AND+-obsoletedBy:*&rows=100&start=0</a></li>
<li>Pick a PID from the query result, e.g.</li>
</ol>
<ul>
<li>knb-lter-cap.148.9</li>
<li>CLOEBDMETADATA.10242013.1</li>
</ul>
<ol>
<li>Attempt to resolve() or get() the object via the REST API like: <a href="https://cn.dataone.org/cn/v1/object/CLOEBDMETADATA.10242013.1">https://cn.dataone.org/cn/v1/object/CLOEBDMETADATA.10242013.1</a></li>
<li>Receive a NotFound error instead of the object.</li>
</ol>
<p>Notes:</p>
<p>In IRC, Skye noticed that the objects can be retrieved via their respective MN so it appears this issue may be a Metacat replication issue.</p>
Java Client - Task #7389 (Testing): V2 D1Object fails to download V1 contenthttps://redmine.dataone.org/issues/73892015-09-28T17:34:30ZChris Jonescjones@nceas.ucsb.edu
<p>During testing of services in the mixed V1/V2 DEV2 environment, D1Object fails to download content listed in the ObjectLocationList from a V1-only Member Node. The symptom is a null pointer exception when trying to close a non-existent temporary file where the bytes of the object should have been located. Fix download() to call V1 endpoints on V1-only MNs.</p>
Java Client - Task #7120 (Testing): Fix DataPackage.insertRelationship() to handle any URI for ex...https://redmine.dataone.org/issues/71202015-05-21T16:49:34ZChris Jonescjones@nceas.ucsb.edu
<p>DataPackage currently provides two insertRelationship() methods - one to add ORE relationships between metadata and data members of the aggregation, and a second to provide any relationship using predicates from other namespaces (such as PROV). The latter method assumes that all identifiers should be treated as objects using the CN Base URL when constructing the subject and object URIs. This isn't always the case. Change or override the method to accept any URI as subject and object components of the triple, and fix any tests that use this method.</p>
Infrastructure - Task #6843 (In Progress): Update the prov instance of the RdfXmlSubprocessor to ...https://redmine.dataone.org/issues/68432015-02-06T23:18:48ZChris Jonescjones@nceas.ucsb.edu
<p>In the "sem-prov-design issue 66":<a href="https://github.com/DataONEorg/sem-prov-design/issues/66">https://github.com/DataONEorg/sem-prov-design/issues/66</a> we have renamed the provenance-based Solr fields to include 'prov_' as a prefix, and have added new fields. See also "issue 99":<a href="https://github.com/DataONEorg/sem-prov-design/issues/99">https://github.com/DataONEorg/sem-prov-design/issues/99</a> and "issue 100":<a href="https://github.com/DataONEorg/sem-prov-design/issues/100">https://github.com/DataONEorg/sem-prov-design/issues/100</a>.<br>
Modify the provRdfXmlSubprocessor bean to handle the renaming scheme, the new fields, and the inverse fields determined to be useful. Also, add these fields as static Solr fields so we can remove the '_sm' suffixes from the names.</p>
Infrastructure - Task #4716 (In Progress): refresh client certificate for urn:node:DRYADhttps://redmine.dataone.org/issues/47162014-04-14T17:10:01ZDave Vieglaisdave.vieglais@gmail.com
<p>Contact is Ryan Scherle</p>
Infrastructure - Task #4714 (In Progress): Refresh client certificate for MN urn:node:TFRIhttps://redmine.dataone.org/issues/47142014-04-14T16:48:17ZDave Vieglaisdave.vieglais@gmail.com
<p>Contact is "Meei-ru Jeng" beerjeng at gmail com</p>
Infrastructure - Task #4210 (Testing): Metacat does not set serialVersion correctly in CNodeServi...https://redmine.dataone.org/issues/42102013-12-20T15:22:50ZChris Jonescjones@nceas.ucsb.edu
<p>For DATA and METADATA, CNodeService.archive() and D1NodeService.archive(), respectively, don't increment the serialVersion field. Check this for delete() as well. D1NodeService delegates to DocumentImpl to call the HZ put() method, so the fix needs to be there, and in CNodeService.</p>
Infrastructure - Task #4136 (In Progress): Make cosmetic changes to the distribution maphttps://redmine.dataone.org/issues/41362013-10-29T15:00:56ZChris Jonescjones@nceas.ucsb.edu
<p>From feedback at the AHM:</p>
<ul>
<li><p>The DataONE orange color appears too red on the map due to the transparency of the plum-colored data symbols. Use either a lighter shade of orange, the DataONE teal, etc.</p></li>
<li><p>Move the legend to the top right corner</p></li>
<li><p>Change the title to be more descriptive, like 'Data Set Distribution'</p></li>
</ul>
Infrastructure - Task #3978 (In Progress): Add a CN reporting script that summarizes spatial data...https://redmine.dataone.org/issues/39782013-09-13T16:12:08ZChris Jonescjones@nceas.ucsb.edu
<p>Spatial data in the CN Solr search index includes per-object bounding box data. For client side mapping purposes, these data are too numerous to add to a vector map. Create a spatial summarization script that reduces the total points to summarized counts at a given cell resolution. Allow for the resolution to be configurable. Export the result as a JSON object, compatible with mapping libraries like heatmapjs and D3js.</p>
Infrastructure - Task #3864 (In Progress): Release CCI 1.2.1 Featureshttps://redmine.dataone.org/issues/38642013-07-12T15:43:49ZChris Jonescjones@nceas.ucsb.edu
<p>In particular, the plan is to upgrade the the cn-buildout system to roll in the Ansible deployment features so we build out the CN base operating system and D1 software stack in a standardized way. </p>
<p>Non-ansible features moved to release 1.2.2 (redmine task 3866) to minimize change associated with Ansible update and to allow some ORE parsing issues to shake out.</p>
Infrastructure - Task #3610 (In Progress): Include ESRI-specific FGDC metadata schema in object f...https://redmine.dataone.org/issues/36102013-02-26T16:14:08ZBen Leinfelderleinfelder@nceas.ucsb.edu
<p>The SANParks node uses an ESRI/ArcGIS flavor of the FGDC metadata standard. Currently we have no ObjectFormatId for this metadata and they are classified as text/plain data objects.<br>
Original (as it should be):<br>
<a href="http://dataknp.sanparks.org/sanparks/metacat/nikkis.180.1/sanparks">http://dataknp.sanparks.org/sanparks/metacat/nikkis.180.1/sanparks</a><br>
In DataONE MN:<br>
<a href="http://dataknp.sanparks.org/sanparks/d1/mn/v1/meta/nikkis.180.1">http://dataknp.sanparks.org/sanparks/d1/mn/v1/meta/nikkis.180.1</a><br>
in CN:<br>
<a href="https://cn.dataone.org/cn/v1/meta/nikkis.180.1">https://cn.dataone.org/cn/v1/meta/nikkis.180.1</a></p>
<p>We need to add an appropriate format to the list and reclassify these objects both on the CN and the MN.</p>
Infrastructure - Task #3419 (In Progress): CNRead.describe() does not return Content-Length headerhttps://redmine.dataone.org/issues/34192012-12-12T16:33:41ZRoger Dahldahl@unm.edu
<p>dahl@vm-dataone:~/d1/d1_python/d1_client_onedrive/src$ curl -v -X HEAD <a href="https://cn-stage.test.dataone.org/cn/v1/object/SEF001_024MTBD004R00_20060719.50.5">https://cn-stage.test.dataone.org/cn/v1/object/SEF001_024MTBD004R00_20060719.50.5</a><br>
...<br>
< HTTP/1.1 200 OK<br>
< Date: Wed, 12 Dec 2012 16:19:10 GMT<br>
< Server: Apache/2.2.14 (Ubuntu)<br>
< DataONE-Checksum: MD5,501f259043b9cbfdfaa4ed3944f93698<br>
< Last-Modified: Thu, 01 Jan 1970 00:00:00 GMT<br>
< DataONE-ObjectFormat: eml://ecoinformatics.org/eml-2.0.1<br>
< DataONE-SerialVersion: 1<br>
< Vary: Accept-Encoding<br>
< Content-Type: text/xml;charset=UTF-8<br>
* no chunk, no close, no size. Assume close to signal end</p>