DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2020-02-29T01:00:11ZDataONE Tasks
Redmine CN REST - Bug #8860 (New): /token endpoint doesn't set a content-type and character encodinghttps://redmine.dataone.org/issues/88602020-02-29T01:00:11ZBryce Mecummecum@nceas.ucsb.edu
<p>On Firefox only, requests to the /portal/token endpoint (i.e., the one MetacatUI and other clients use to fetch their auth tokens, like <a href="https://cn.dataone.org/portal/token">https://cn.dataone.org/portal/token</a>) result in errors in the browser console.</p>
<p>When you access the URL via an XHR request, you see:</p>
<blockquote>
<p>XML Parsing Error: syntax error<br>
Location: <a href="https://cn-stage.test.dataone.org/portal/token">https://cn-stage.test.dataone.org/portal/token</a><br>
Line Number 1, Column 1:</p>
</blockquote>
<p>When you access the URL directly in Firefox:</p>
<blockquote>
<p>The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.</p>
</blockquote>
<p>I had a hunch that this error would go away if the response simply had the <code>Content-Type</code> header set to <code>text/plain; charset=utf-8</code> so I spun up <code>mitmproxy</code>, made that edit to the intercepted response, and saw that the error does go away.</p>
<p>I think we should modify the portal code to set the <code>Content-Type</code> header like above so the error goes away.</p>
Infrastructure - Bug #8815 (New): Investigate and fix failed sync/harvest of doi:10.18739/A2CH48https://redmine.dataone.org/issues/88152019-06-04T23:26:43ZBryce Mecummecum@nceas.ucsb.edu
<p>I noticed this while doing something unrelated. There isn't a copy of the sysmeta for <a href="https://arcticdata.io/metacat/d1/mn/v2/meta/doi:10.18739/A2CH48">https://arcticdata.io/metacat/d1/mn/v2/meta/doi:10.18739/A2CH48</a> on the CN:</p>
<p>At: <a href="http://cn.dataone.org/cn/v2/meta/doi:10.18739/A2CH48">http://cn.dataone.org/cn/v2/meta/doi:10.18739/A2CH48</a></p>
<pre><code class="xml syntaxhl"><span class="CodeRay"><span class="preprocessor"><?xml version="1.0" encoding="UTF-8"?></span><span class="tag"><error</span> <span class="attribute-name">detailCode</span>=<span class="string"><span class="delimiter">"</span><span class="content">1420</span><span class="delimiter">"</span></span> <span class="attribute-name">errorCode</span>=<span class="string"><span class="delimiter">"</span><span class="content">404</span><span class="delimiter">"</span></span> <span class="attribute-name">name</span>=<span class="string"><span class="delimiter">"</span><span class="content">NotFound</span><span class="delimiter">"</span></span><span class="tag">></span>
<span class="tag"><description></span>No system metadata could be found for given PID: doi:10.18739/A2CH48<span class="tag"></description></span>
<span class="tag"></error></span>
</span></code></pre>
<p>CNs are in readonly mode at the moment so I'm filing this here so someone can a look later on.</p>
Infrastructure - Decision #8774 (New): Add new CN format for MPEG-2 or update video/mpeg format t...https://redmine.dataone.org/issues/87742019-03-08T02:04:18ZBryce Mecummecum@nceas.ucsb.edu
<p>We already have this format:</p>
<pre><objectFormat>
<formatId>video/mpeg</formatId>
<formatName>MPEG-1 Video</formatName>
<formatType>DATA</formatType>
<mediaType name="video/mpeg"/>
<extension>mpg</extension>
</objectFormat>
</pre>
<p>It clearly says MPEG-1 Video but there is also an MPEG-2 format. It'd be useful to have a proper format ID to use for MPEG-2 video. The info for MPEG-2 is largely the same as for MPEG-1 (format type, mediaType, extension). Should we add a new format or just update the description of the existing one? We also have an MP4 format type so maybe that'd be a reason to just add another to cover MPEG-2.</p>
Infrastructure - Decision #8765 (Closed): Consider changing how BaseSolrFieldXPathTest workshttps://redmine.dataone.org/issues/87652019-02-13T00:36:55ZBryce Mecummecum@nceas.ucsb.edu
<p>Ran into a weird thing while expanding the <a href="https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/test/java/org/dataone/cn/index/SolrFieldXPathEmlTest.java">https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/test/java/org/dataone/cn/index/SolrFieldXPathEmlTest.java</a> to test an EML 2.2.0 doc.</p>
<p><code>SolrFieldXPathEmlTest</code> compares values extracted via various subprocessors to a set of expectations stored in a <code>HashMap<String, String>()</code> (of form <code><fieldName, expectedValue></code>). This data structured limits expectations to one value per field. Forever ago, in <code>r7985</code>,</p>
<pre>r7985 | sroseboo | 2012-03-23 12:12:53 -0800 (Fri, 23 Mar 2012) | 1 line
initial commit of search index support for parsing FGDC science metadata docs.
Index: src/test/java/org/dataone/cn/index/BaseSolrFieldXPathTest.java
</pre>
<p>support was added for testing multiple expectations for a single field by defining a convention of smushing multiple values into a single string, separated by two # characters (##). For example,</p>
<pre>eml210Expected.put("project", "Random Project Title##Another Random Project Title");
</pre>
<p>would test the <code>project</code> field for two values, "Random Project Title" and "Another Project Title" not a literal "Random Project Title##Another Random Project Title". I imagine this was picked because it's rare to see a ## in a metadata record which seems reasonable.</p>
<p>I ran afoul of this today because I wanted to test an expectation for a field with a # in it and I couldn't because it was being split when I didn't want it to be. Why did a single # break things when the convention above is a double #? Because <code>BaseSolrFieldXPathTest.java</code> splits the expectation string using <code>StringUtils.split</code> like this:</p>
<pre>StringUtils.split(expectedForField, "##") // Where expectedForField might equal "Random Project Title##Another Random Project Title"
</pre>
<p>According to <a href="https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split-java.lang.String-java.lang.String-">https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split-java.lang.String-java.lang.String-</a>, the second arg to <code>StringUtils.split</code>, <code>separatorChars</code>, is "the characters used as the delimiters, null splits on whitespace" which tells me we're using it wrong. I think the method we needed to use was <code>StringUtils.splitByWholeSeparator</code> which correctly splits only on double #.</p>
<p>I could just change the line of code and move on but that breaks a ton (98) of tests. Before I did that, I wanted to ask what others thought. I see a few routes:</p>
<ol>
<li>Change the code to correctly split only on ## and not # <em>and</em> update all the tests I break</li>
<li>Do 1 <em>and</em> use a different separator to something more future proof. I suggest "&&" because that'd be invalid in XML outside a CDATA section.</li>
<li>Change how the expectations get tested so field expectations can have a one to many relationship. This'd take some time so I'd opt for (1) or (2) instead</li>
</ol>
<p>Any preferences out there?</p>
Infrastructure - Decision #8693 (In Progress): Support Google Dataset Search on search.dataone.or...https://redmine.dataone.org/issues/86932018-09-07T00:16:59ZBryce Mecummecum@nceas.ucsb.edu
<a name="Background"></a>
<h2 >Background<a href="#Background" class="wiki-anchor">¶</a></h2>
<p>Yesterday, <a href="https://toolbox.google.com/datasetsearch" class="external">Google Dataset Search</a> launched. We previoiusly attempted to make MetacatUI (and by extension, DataONE Search) compatible with it by <a href="https://github.com/NCEAS/metacatui/issues/482" class="external">injecting Schema.org JSON-LD into appropriate pages</a>. During development and testing, we checked our compatibility with the upcoming Google Dataset Search using Google's <a href="https://search.google.com/structured-data/testing-tool" class="external">Structured Data Testing Tool</a>. During development, this was all working fine and the feature appeared to be compatible but, after launching the feature on search.dataone.org, behavior changed on Google's end making it so Google no longer saw this JSON-LD. The reason for this is likely that, because MetacatUI follows a single page application architecture and we inject the JSON-LD on the client side, Google's JSON-LD crawler only saw what was sent from the server (a nearly empty index.html) and not our full page (with JSON-LD). I was able to test this theory and, while Google's crawler does execute JavaScript, it limits execution to about or exactly five seconds and MetacatUI <em>usually</em> doesn't finish injecting JSON-LD and rendering all content until after that timeout.</p>
<p>Potential paths forward to get DataONE Search compatible with Google's Dataset Search include (none of which are mutually exclusive):</p>
<ol>
<li>The assets that make up MetacatUI and the asset loading strategies could be optimized: <a href="https://github.com/NCEAS/metacatui/issues/224">https://github.com/NCEAS/metacatui/issues/224</a></li>
<li>Move the code (and any dependencies) that injects JSON-LD further up in the app boot so that Google sees it</li>
<li>Inject the appropriate JSON-LD on the server side to guarantee that Google sees it (originally Matt Jones' idea!)</li>
</ol>
<p>(1) is being worked on for sure, and (2) may not be needed if (1) is successful. I want to talk about option (3) because:</p>
<ul>
<li>It's a quicker solution (I already have something working) which would help get us involved in the project faster</li>
<li>It paves the way for future features and/or improvements to MetacatUI (we could be rendering more on the server side than just JSON-LD, like other metadata, more page content, etc)</li>
</ul>
<a name="What-I-did"></a>
<h2 >What I did<a href="#What-I-did" class="wiki-anchor">¶</a></h2>
<p>To test this idea, I modified a <a href="https://github.com/amoeba/backbone-pushstate-example" class="external">previous project</a> which is just a simple Node (Express.js) app that hosts MetacatUI by intercepting every request and serving the appropriate asset. In injects Schema.org JSON-LD, when appropriate, by querying the CN Solr index before sending MetacatUI's index.html to the client. <a href="https://github.com/amoeba/metacatui-ssr" class="external">Code is here</a> and its deployed <a href="http://neutral-cat.nceas.ucsb.edu/" class="external">here</a>. View source on any /view/... pages and you'll see a minimal Schema.org/Dataset description in the head. More properties can be added later. I did it quick and dirty: The app pre-loads MetacatUI's index.html as a <code>String</code> at app boot and injects the JSON-LD into it. No templating language or other magic.</p>
<a name="Things-to-address"></a>
<h2 >Things to address<a href="#Things-to-address" class="wiki-anchor">¶</a></h2>
<ul>
<li>How do we feel abouts switching from hosting MetacatUI via Apache (simple, bullet proof) to a Node based deployment just to support this feature (new territory, at least for me)?</li>
<li>If we do switch, we'd want to make really sure the Node app doesn't have weird failure cases where it doesn't return index.html (e.g., when Solr is down, or slow). The app needs to return index.html (and every other static asset) on every request and do it very fast and we should decide what the cutoff is so that it doesn't hold up app boot if Solr is slow/down.</li>
<li>Can this type of deployment easily be integrated with CN buildouts? I've deployed Node apps before by fronting them with Apache/nginx (via reverse proxy) and then keeping the node process up with Upstart</li>
<li>Is this performant enough for DataONE? I think my implementation is non-blocking but I'm not a Node expert so we'd want to code review and probably benchmark </li>
<li>We could wait on (1) and stick with our current deployment strategy</li>
</ul>
<a name="Other-notes"></a>
<h2 >Other notes<a href="#Other-notes" class="wiki-anchor">¶</a></h2>
<p>Unrelated to the Google Dataset Search issue but related to Google's crawling for Google Search, we've also identified:</p>
<ul>
<li>That the Metacat View Service is often unreasonably slow: <a href="https://github.com/NCEAS/metacat/issues/1234">https://github.com/NCEAS/metacat/issues/1234</a> and are planning to figure out why</li>
<li>That we can and should make use of sitemaps to help Google crawl our pages: <a href="https://github.com/NCEAS/metacat/issues/1263">https://github.com/NCEAS/metacat/issues/1263</a></li>
</ul>
Infrastructure - Decision #8616 (New): Consider expanding isotc211's indexing component's keyword...https://redmine.dataone.org/issues/86162018-06-15T00:13:04ZBryce Mecummecum@nceas.ucsb.edu
<p>From </p>
<p><a href="https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/main/resources/application-context-isotc211-base.xml">https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/main/resources/application-context-isotc211-base.xml</a></p>
<p>The current XPath for the <code>keyword</code> field pulls out:</p>
<pre>//gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gmx:Anchor/text() |
//gmd:identificationInfo/gmd:MD_DataIdentification/gmd:descriptiveKeywords/gmd:MD_Keywords/gmd:keyword/gco:CharacterString/text()
</pre>
<p>ISO also defines <code>MD_DataIdentification/gmd:topicCategory</code> which is defined as "The main theme(s) of the dataset." and is required (recommended) when describing a dataset. It's conditional, and repeatable. An example from a PANGAEA doc is</p>
<pre>...
<ns0:topicCategory>
<ns0:MD_TopicCategoryCode>geoscientificInformation</ns0:MD_TopicCategoryCode>
</ns0:topicCategory>
</ns0:MD_DataIdentification>
</pre>
<p>I think it's improve recall to include in our keywords list. It appears to be a controlled vocabulary so we could even make more direct use of it. The controlled vocabulary appears to be (From the MI_Metadata workbook):</p>
<p>Domain: <br>
- farming<br>
- biota<br>
- boundaries<br>
- climatologyMeteorolgyAtmosphere<br>
- economy<br>
- elevation<br>
- environement<br>
- geoscientificInformation<br>
- health<br>
- imageryBaseMapsEarchCover<br>
- intelligenceMilitary<br>
- inlandWaters<br>
- location<br>
- oceans<br>
- planningCadastre<br>
- society<br>
- structure<br>
- transportation<br>
- utilitiesCommunicationgeoscientificInformation, health, imageryBaseMapsEarchCover, intelligenceMilitary, inlandWaters, location, oceans, planningCadastre, society, structure, transportation, utilitiesCommunication</p>
<p>Both NCEI and PANGAEA make use of this field in their ISO docs.</p>
Infrastructure - Bug #8612 (Closed): Improperly formatted Alternate Data Access URLshttps://redmine.dataone.org/issues/86122018-06-12T22:37:39ZBryce Mecummecum@nceas.ucsb.edu
<p>If I go to</p>
<p><a href="https://search.dataone.org/#view/urn:uuid:8d639e70-55eb-40aa-b1a4-29712fd31b63">https://search.dataone.org/#view/urn:uuid:8d639e70-55eb-40aa-b1a4-29712fd31b63</a></p>
<p>and look at he underlying URL in the Address field, I see:</p>
<p><a href="https://search.dataone.org/%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20http://data.eol.ucar.edu/codiac/dss/id=102.036%0A">https://search.dataone.org/%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20http://data.eol.ucar.edu/codiac/dss/id=102.036%0A</a></p>
<p>which should be </p>
<p><a href="http://data.eol.ucar.edu/codiac/dss/id=102.036%0A">http://data.eol.ucar.edu/codiac/dss/id=102.036%0A</a></p>
<p>Looks like the underlying XSLT isn't going deep enough down the path hierarchy</p>
Infrastructure - Decision #8601 (New): Decide on a URI space for DataONE resourceshttps://redmine.dataone.org/issues/86012018-06-05T00:39:42ZBryce Mecummecum@nceas.ucsb.edu
<a name="Summary"></a>
<h2 >Summary<a href="#Summary" class="wiki-anchor">¶</a></h2>
<p>In the past, we've needed to represent DataONE resources (e.g., Objects, "datasets", etc.) in Linked Open Data contexts. Currently, these resources don't have have canonical URIs.</p>
<p>For example, take a dataset on search.dataone.org with the following URL:</p>
<pre>https://search.dataone.org/#view/doi:10.18739/A28K74W2F
</pre>
<p>Candiate URIs for this resource include:</p>
<ol>
<li><a href="https://search.dataone.org/#view/doi:10.18739/A28K74W2F:">https://search.dataone.org/#view/doi:10.18739/A28K74W2F:</a> Depends on implementation details in MetacatUI's router, uses fragment URLs in the URL which we're deprecating soon anyhow</li>
<li><a href="https://cn.dataone.org/cn/v2/resolve/doi:10.18739/A28K74W2F:">https://cn.dataone.org/cn/v2/resolve/doi:10.18739/A28K74W2F:</a> Depends on implementation details of the DataONE API and is tied to a specific version of the DataONE API. e.g., When API version 3 is release, will ../v2/resolve/doi:10.18739/A28K74W2F and ../v3/resolve/doi:10.18739/A28K74W2F refer to the same resource?</li>
</ol>
<a name="Proposal"></a>
<h2 >Proposal<a href="#Proposal" class="wiki-anchor">¶</a></h2>
<p>Create a URI space that all services can integrate against. This space follows the convention of:</p>
<p><a href="https://dataone.org/%7Bresource_type%7D/%7Bresource_identifier%7D">https://dataone.org/{resource_type}/{resource_identifier}</a></p>
<p>where <code>{resource_type}</code> is a singular name of a top level DataONE resource such as "dataset", "person", or "object" and <code>{resource_identifier}</code> is a type-appropriate identifier (e.g., PID of a science metadata Object for the dataset type, DN for the person type, etc.). The collection of top level resources (e.g., datasets), follows the form <a href="https://dataone.org/%7Bresource_type_plural%7D">https://dataone.org/{resource_type_plural}</a> (e.g., "datasets").</p>
<p>Examples:</p>
<ul>
<li>Dataset: <a href="https://dataone.org/dataset/doi%3A10.18739%2FA28K74W2F">https://dataone.org/dataset/doi%3A10.18739%2FA28K74W2F</a></li>
<li>Person: <a href="https://dataone.org/person/https%3A%2F%2Forcid.org%2F0000-0002-0381-3766">https://dataone.org/person/https%3A%2F%2Forcid.org%2F0000-0002-0381-3766</a></li>
<li>Object: <a href="https://dataone.org/object/urn%3Auuid%3A3c80e9d6-277c-4a32-bc7a-d85c499f370f">https://dataone.org/object/urn%3Auuid%3A3c80e9d6-277c-4a32-bc7a-d85c499f370f</a></li>
<li>All datasets on DataONE: <a href="https://dataone.org/datasets">https://dataone.org/datasets</a></li>
</ul>
<a name="Expected-outcomes"></a>
<h2 >Expected outcomes<a href="#Expected-outcomes" class="wiki-anchor">¶</a></h2>
<ul>
<li>A normative document describing the URI space will be added to the DataONE documentation (<a href="https://releases.dataone.org/online/api-documentation-v2.0/">https://releases.dataone.org/online/api-documentation-v2.0/</a>)</li>
<li>Other project will make use of these URIs</li>
</ul>
<a name="Affected-projects"></a>
<h2 >Affected projects<a href="#Affected-projects" class="wiki-anchor">¶</a></h2>
<ul>
<li>Metacat (sitemap functionality will make use of these URIs)</li>
<li>MetacatUI (Google Structured Data integration via JSON-LD will use these URIs for resources)</li>
<li>GeoLink (DataONE LOD graph will use these URIs for resources)</li>
</ul>
<a name="Future-work"></a>
<h2 >Future work<a href="#Future-work" class="wiki-anchor">¶</a></h2>
<p>With the URI space decided, we can start working on a unified, content-negotating DataONE resolve service (different from the CN/MNStorage.resolve API method). See <a href="https://hpad.dataone.org/GYJgjAJgpgHMwFoBGBWALItBjGMEE4kBDAZgKzSWBSS2ADYoSg==#detail-object-content-negotiation-for-objects">https://hpad.dataone.org/GYJgjAJgpgHMwFoBGBWALItBjGMEE4kBDAZgKzSWBSS2ADYoSg==#detail-object-content-negotiation-for-objects</a>. How this works is not being decided in this Redmine ticket.</p>
<a name="Previous-work-discussions-chronological-order"></a>
<h2 >Previous work / discussions (chronological order):<a href="#Previous-work-discussions-chronological-order" class="wiki-anchor">¶</a></h2>
<ul>
<li><a href="https://docs.google.com/document/d/1yU-d-aFdtiSB91Wk0sFj1xthW8skBPof9qKwx00bjxE/edit?usp=sharing">https://docs.google.com/document/d/1yU-d-aFdtiSB91Wk0sFj1xthW8skBPof9qKwx00bjxE/edit?usp=sharing</a></li>
<li><a href="https://hpad.dataone.org/GYJgjAJgpgHMwFoBGBWALItBjGMEE4kBDAZgKzSWBSS2ADYoSg==">https://hpad.dataone.org/GYJgjAJgpgHMwFoBGBWALItBjGMEE4kBDAZgKzSWBSS2ADYoSg==</a></li>
</ul>
Member Nodes - Support #8209 (Closed): Five objects failed to sync to the CN from urn:node:ARCTIChttps://redmine.dataone.org/issues/82092017-10-25T18:28:11ZBryce Mecummecum@nceas.ucsb.edu
<p>I noticed this while doing other work and wanted to report the Objects in question so they can get fixed.</p>
<p>doi:10.18739/A2T091<br>
doi:10.18739/A2FV7G<br>
doi:10.18739/A28Z56<br>
doi:10.18739/A2Z55G<br>
doi:10.18739/A2PP08</p>
<p>Calls to CNRead.get() and CNRead.resolve() are giving me error that the System Metadata for each Object can't be found, e.g.,</p>
<p>No system metadata could be found for given PID: doi:10.18739/A2T091"</p>
Infrastructure - Decision #8189 (New): Proposal to change the roles mapped to the origin Solr fie...https://redmine.dataone.org/issues/81892017-10-02T18:04:37ZBryce Mecummecum@nceas.ucsb.edu
<p>While discussing changing the behavior of the origin field in the ISO indexing component (<a href="https://redmine.dataone.org/issues/8165">https://redmine.dataone.org/issues/8165</a>) to make it more selective about where in the document originators are pulled, Matt Jones (over email) suggested we revisit the set of roles as well. Let's do that in this Issue.</p>
<p>The current set of roles mapped to the origin field are:</p>
<ul>
<li><em>originator</em>: party who created the resource</li>
<li><em>author</em>: party who authored the resource</li>
<li><em>owner</em>: party that owns the resource</li>
<li><em>principalInvestigator</em>: key party responsible for gathering information and conducting research</li>
</ul>
<p>This current set of roles may be surprising to some/many users so a possible outcome of this Issue is to greatly improve the content in our search index. This would have impacts on the CN and MNs running Metacat.</p>
<p>Key points:</p>
<ul>
<li>Matt's proposal is to exclude principalInvestigator from this list</li>
<li>The Research Workspace Member Node appears to be using the principalInvestigator role for one or more persons they want in their citation so if we follow Matt's proposal we may need to discuss this with them</li>
<li>I would lobby for only including originator and author but my reading of the definitions is a naïve one</li>
</ul>
<p>I'd like us to have a discussion on this, make the relevant change to the codebase, and then bring the discussion back to the MN operators.</p>
<p>Relevant links:</p>
<ul>
<li><a href="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml">http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml</a> (official? definitions for CI_RoleCode)</li>
<li><a href="https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries">https://geo-ide.noaa.gov/wiki/index.php?title=ISO_19115_and_19115-2_CodeList_Dictionaries</a> (NOAA wiki entry for the role codes)</li>
</ul>
Infrastructure - Bug #8052 (New): Geohashed value is incorrecthttps://redmine.dataone.org/issues/80522017-03-27T20:43:47ZBryce Mecummecum@nceas.ucsb.edu
<p>Adam Shepherd at BCO-DMO uploaded a test DCX doc here:</p>
<p><a href="https://search-sandbox.test.dataone.org/#view/http://lod.bco-dmo.org/id/dataset-file/682007">https://search-sandbox.test.dataone.org/#view/http://lod.bco-dmo.org/id/dataset-file/682007</a></p>
<p>which has bounding coordinates of:</p>
<p>North<br>
50.4907 degrees<br>
South<br>
20.4907 degrees<br>
East<br>
-120 degrees<br>
West<br>
120.826 degrees</p>
<p>and when we looked at the geohash values stored in the index for the record they appear to be incorrect. The bounding coordinates this DCX record is using are a bit weird but I'm not sure they're invalid. Google's JavaScript maps API calculates the centroid as 35.490700000000004,-179.58700000000002 which, according to this tool, <a href="http://www.movable-type.co.uk/scripts/geohash.html">http://www.movable-type.co.uk/scripts/geohash.html</a>, should have a geohash of 8n23ckusk but has a geohash of sn23ckusr instead, which is nearly 180 degrees longitude away from the expected location.</p>
Infrastructure - Bug #8043 (New): The origin field for EML documents isn't properly extracted whe...https://redmine.dataone.org/issues/80432017-03-10T21:36:25ZBryce Mecummecum@nceas.ucsb.edu
<p>We just ran into this with the following EML record: <a href="https://knb.ecoinformatics.org/#view/doi:10.5063/F15B00CC">https://knb.ecoinformatics.org/#view/doi:10.5063/F15B00CC</a></p>
<p>The EML has six creators (Kiesecker, Fargione, Baruch-Mordo, Trainor, Ryan, Patterson) but the origin field in the Solr index has two (Ryan, Patterson). After some digging, we realized this was likely because the indexing component responsible for EML doesn't respect EML references. The XML for the relevant section is:</p>
<p><code><br>
<creator scope="document"><br>
<references>1484778487589</references><br>
</creator><br>
<creator scope="document"><br>
<references>1484778426939</references><br>
</creator><br>
<creator scope="document"><br>
<references>1484778028081</references><br>
</creator><br>
<creator scope="document"><br>
<references>1484778171131</references><br>
</creator><br>
<creator id="1485385283277" scope="document"><br>
<individualName><br>
<salutation>Dr.</salutation><br>
<givenName>Joe</givenName><br>
<surName>Ryan</surName><br>
</individualName><br>
<organizationName>University of Colorado Boulder</organizationName><br>
<positionName>Professor</positionName><br>
<electronicMailAddress>joseph.ryan@colorado.edu</electronicMailAddress><br>
</creator><br>
<creator id="1484777776976" scope="document"><br>
<individualName><br>
<salutation>Dr.</salutation><br>
<givenName>Lauren</givenName><br>
<surName>Patterson</surName><br>
</individualName><br>
<organizationName>Duke University</organizationName><br>
<positionName>Water Policy Associate</positionName><br>
<address scope="document"><br>
<deliveryPoint>Nicholas Institute for Environmental Policy Solutions, Duke University</deliveryPoint><br>
<city>Durham</city><br>
<administrativeArea>NC</administrativeArea><br>
<postalCode>27708</postalCode><br>
<country>USA</country><br>
</address><br>
<electronicMailAddress>lauren.patterson@duke.edu</electronicMailAddress><br>
</creator><br>
</code></p>
<p>It would be really nice if the origin field got populated with all those referenced creators.</p>
Infrastructure - Bug #7858 (New): Obsoleting a resource map clears the resourceMap field for the ...https://redmine.dataone.org/issues/78582016-08-03T21:34:03ZBryce Mecummecum@nceas.ucsb.edu
<p>This is long-standing behavior that I consider a bug. That said, there are likely plenty of design conversations that predate me I'm unaware of.</p>
<p>When I update a Data Package by updating the metadata object and its resource map with new ones, the resourceMap field in the Solr index for the obsoleted metadata object is cleared. I expected it not to be cleared.</p>
<p>Why is this the way it works? The way I see it, clearing out the resourceMap field in the Solr index for the obsoleted metadata object reduces benefit of us versioning objects. When a package is cited by its metadata object's PID and the package is updated after the citation was published, a visitor to the dataset landing page will no longer see the package because the resource map isn't in the index. Of course they will be shown a link to the latest version of the package which does have a resource map but that's not what they cited.</p>
DataONE API - Bug #7684 (New): Call to MNStorage.update() via REST API returns java.lang.StackOve...https://redmine.dataone.org/issues/76842016-03-21T23:07:39ZBryce Mecummecum@nceas.ucsb.edu
<p>I was trying to update an object via the REST API via cURL and forgot to enter the correct URL. The cURL command I used and response is:</p>
<p>$ curl -X PUT -H "Authorization: Bearer $TOKEN" -F "pid=resourceMap_doi:10.5065/D6G44NFV" -F "object=@object.xml" -F "sysmeta=@sysmeta.xml" -F "newPid=resourceMap_doi:10.5065/D6G44NFV_v3" $URL<br>
<?xml version="1.0" encoding="UTF-8"?><br>
java.lang.StackOverflowError<br>
</p>
<p>Where $URL was '<a href="https://arcticdata.io/metacat/d1/mn/v2/object">https://arcticdata.io/metacat/d1/mn/v2/object</a>' instead of '<a href="https://arcticdata.io/metacat/d1/mn/v2/object/resourceMap_doi:10.5065/D6G44NFV">https://arcticdata.io/metacat/d1/mn/v2/object/resourceMap_doi:10.5065/D6G44NFV</a>'</p>
<p>I expected to receive some sort of warning/error that I had forgotten to specify the URL properly for this call but instead saw a StackOverflowError.</p>
DataONE API - Bug #7578 (New): Fix 404 link to d1_instance_generator folder in documentationhttps://redmine.dataone.org/issues/75782016-01-08T22:01:20ZBryce Mecummecum@nceas.ucsb.edu
<p>In the MN API documentation for MNStorage.create (<a href="https://jenkins-ucsb-1.dataone.org/job/API%20Documentation%20-%20trunk/ws/api-documentation/build/html//apis/MN_APIs.html#MNStorage.create">https://jenkins-ucsb-1.dataone.org/job/API%20Documentation%20-%20trunk/ws/api-documentation/build/html//apis/MN_APIs.html#MNStorage.create</a>), I found a the following paragraph contains a broken link to d1_instance_generator:</p>
<blockquote>
<p>"The system metadata included with the create call must contain values for the elements required to be set by clients (see System Metadata). The system metadata document can be crafted by hand or preferably with a tool such as generate_sysmeta.py which is available in the d1_instance_generator Python package. See documentation included with that package for more information on its operation."</p>
</blockquote>
<p>The link to d1_instance_generator was to the SVN folder <a href="https://repository.dataone.org/software/cicore/trunk/d1_instance_generator">https://repository.dataone.org/software/cicore/trunk/d1_instance_generator</a> which is currently a 404. I think the folder moved to /d1_test_utilities_python/src/d1_test/instance_generator.</p>