DataONE Tasks: Issueshttps://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2020-07-02T19:06:39ZDataONE Tasks
Redmine Infrastructure - Task #8865 (Closed): Configure dataone.org web server to redirect DataONE datase...https://redmine.dataone.org/issues/88652020-07-02T19:06:39ZBryce Mecummecum@nceas.ucsb.edu
<p>As scoped out in <a href="https://hpad.dataone.org/8h3o_7VPTIibo5xL9bz24w">https://hpad.dataone.org/8h3o_7VPTIibo5xL9bz24w</a>, we'd like to be able to referenced DataONE resources in a linked-open-data manner. This is useful because it forms the basis of building more interested things on top of them. However, those resources (e.g., Data Packages) don't currently have, subjectively, suitable IRIs, though they have a variety of URLs.</p>
<p>The ideas in the above proposal are multi-tiered but a good first start can be achieved immediately: Support a PIRI space for "Datasets" (DataONE Data Packages) by redirecting requests from their IRI form:</p>
<p><code>https://dataone.org/datasets/$ID</code></p>
<p>to their URL form:</p>
<p><code>https://search.dataone.org/view/$ID</code>.</p>
<p>Such a redirection can be achieved within our Apache configuration using <code>mod_rewrite</code> and a rule similar to:</p>
<pre>RewriteRule "^/datasets/(.+)$" "https://search.dataone.org/view/$1" [L,R]
</pre> CN REST - Bug #8860 (New): /token endpoint doesn't set a content-type and character encodinghttps://redmine.dataone.org/issues/88602020-02-29T01:00:11ZBryce Mecummecum@nceas.ucsb.edu
<p>On Firefox only, requests to the /portal/token endpoint (i.e., the one MetacatUI and other clients use to fetch their auth tokens, like <a href="https://cn.dataone.org/portal/token">https://cn.dataone.org/portal/token</a>) result in errors in the browser console.</p>
<p>When you access the URL via an XHR request, you see:</p>
<blockquote>
<p>XML Parsing Error: syntax error<br>
Location: <a href="https://cn-stage.test.dataone.org/portal/token">https://cn-stage.test.dataone.org/portal/token</a><br>
Line Number 1, Column 1:</p>
</blockquote>
<p>When you access the URL directly in Firefox:</p>
<blockquote>
<p>The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature.</p>
</blockquote>
<p>I had a hunch that this error would go away if the response simply had the <code>Content-Type</code> header set to <code>text/plain; charset=utf-8</code> so I spun up <code>mitmproxy</code>, made that edit to the intercepted response, and saw that the error does go away.</p>
<p>I think we should modify the portal code to set the <code>Content-Type</code> header like above so the error goes away.</p>
Infrastructure - Task #8858 (New): Update CN Apache configs in version control with directives to...https://redmine.dataone.org/issues/88582020-02-05T20:02:12ZBryce Mecummecum@nceas.ucsb.edu
<p>Sitemaps are located on disk in ${tomcat_webapps_dir}/${context}/sitemaps as <code>sitemap_index.xml</code> and <code>sitemap%d.xml</code> (for each sub-sitemap).</p>
<p>The rule we've come up with is:</p>
<p><code>RewriteRule ^/(sitemap.+) /metacat/sitemaps/$1 [R=303]</code></p>
Infrastructure - Bug #8836 (Closed): CSS and JS not loading on ProvONE documentationhttps://redmine.dataone.org/issues/88362019-08-12T21:54:25ZBryce Mecummecum@nceas.ucsb.edu
<p>I was looking at <a href="http://jenkins-1.dataone.org/jenkins/view/Documentation%20Projects/job/ProvONE-Documentation-trunk/ws/provenance/ProvONE/v1/provone.html">http://jenkins-1.dataone.org/jenkins/view/Documentation%20Projects/job/ProvONE-Documentation-trunk/ws/provenance/ProvONE/v1/provone.html</a> today and Chrome and Safari are both upset with how its CSS and JS are set up and are refusing to load either. Steps to reproduce:</p>
<ol>
<li>Go to <a href="http://jenkins-1.dataone.org/jenkins/view/Documentation%20Projects/job/ProvONE-Documentation-trunk/ws/provenance/ProvONE/v1/provone.html">http://jenkins-1.dataone.org/jenkins/view/Documentation%20Projects/job/ProvONE-Documentation-trunk/ws/provenance/ProvONE/v1/provone.html</a></li>
<li>See no styles are loaded, hide/show examples buttons don't work. See errors in devtools about CSP errors.</li>
</ol>
Infrastructure - Task #8820 (Closed): Add new DataONE Object format for HDF4/5 file formatshttps://redmine.dataone.org/issues/88202019-06-13T19:54:01ZBryce Mecummecum@nceas.ucsb.edu
<p>HDF 4 and 5 are efficient binary formats for data commonly used in science: <a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format">https://en.wikipedia.org/wiki/Hierarchical_Data_Format</a>, <a href="https://www.hdfgroup.org/solutions/hdf5/">https://www.hdfgroup.org/solutions/hdf5/</a>. I don't think we have a lot of content in this format, if any, but it's a pretty common format and a good one at that.</p>
<p>I did some research on MIME types and file extensions:</p>
<p>Re: MIME type:</p>
<blockquote>
<p>The recommended content is application/x-hdf5 for data in HDF5 or application/x-hdf for data in earlier versions.</p>
</blockquote>
<p><a href="https://www.hdfgroup.org/2018/06/citations-for-hdf-data-and-software/">https://www.hdfgroup.org/2018/06/citations-for-hdf-data-and-software/</a></p>
<p>Re: Extension,</p>
<ul>
<li><a href="https://en.wikipedia.org/wiki/Hierarchical_Data_Format">https://en.wikipedia.org/wiki/Hierarchical_Data_Format</a> lists a few of the variants I've seen</li>
<li>The R hdf5 package uses the h5 extension (<a href="https://github.com/grimbough/rhdf5/tree/master/inst/testfiles">https://github.com/grimbough/rhdf5/tree/master/inst/testfiles</a>) so I went with this</li>
<li>.hdf4 and .hdf5 seem common too but h4/h5 seems a tad more common</li>
</ul>
<p><strong>Here are the details for each of the new formats:</strong></p>
<p><code>HDF4</code></p>
<ul>
<li>formatId: <code>application/x-hdf</code></li>
<li>formatName: Hierarchical Data Format version 4 (HDF4)</li>
<li>mediaType: <code>application/octet-stream</code></li>
<li>extension: h4</li>
</ul>
<p><code>HDF5</code></p>
<ul>
<li>formatId: <code>application/x-hdf5</code></li>
<li>formatName: Hierarchical Data Format version 5 (HDF5)</li>
<li>mediaType: <code>application/octet-stream</code></li>
<li>extension: h5</li>
</ul>
Infrastructure - Bug #8802 (Closed): Titles in EML records that use <value> in <title> do not get...https://redmine.dataone.org/issues/88022019-05-18T00:55:24ZBryce Mecummecum@nceas.ucsb.edu
<p>Margaret O'Brien over at EDI noticed this and sent it my way. She found an EML record that serializes the title in a schema-valid but unusual way:</p>
<pre>...snip...
<title>
<value>Forest-wide bird survey at 183 sample sites the Andrews Experimental Forest from 2009-present (Reformatted to ecocomDP Design Pattern)</value>
</title>
...snip...
</pre>
<p>See <a href="https://search.dataone.org/view/https://pasta.lternet.edu/package/metadata/eml/edi/359/1">https://search.dataone.org/view/https://pasta.lternet.edu/package/metadata/eml/edi/359/1</a> and notice the citation is missing its title (which is powered by Solr) and also see the accompanying Solr doc at <a href="https://search.dataone.org/cn/v2/query/solr/?q=id:%22https://pasta.lternet.edu/package/metadata/eml/edi/359/1%22">https://search.dataone.org/cn/v2/query/solr/?q=id:%22https://pasta.lternet.edu/package/metadata/eml/edi/359/1%22</a> and notice the title field is not set.</p>
<p>This is a bit tricky. It's pretty clearly <em>not</em> endorsed in the EML spec, as per:</p>
<p><em>i18nNonEmptyStringType</em>: (emphasis mine)</p>
<blockquote>
<p>This type specifies a content pattern for all elements that require language translations. The xml:lang attribute can be used to define the default language for element content. <em>Additional translations should be included as child 'value' elements</em> that also have an optional xml:lang</p>
</blockquote>
<p>So <code>value</code> in <code>title</code> is intended to be used like this:</p>
<pre><title>
My title
<value xml:lang="fr">Mon titre</value>
</title>
</pre>
<p>and not how it is in <a href="https://search.dataone.org/view/https://pasta.lternet.edu/package/metadata/eml/edi/359/1">https://search.dataone.org/view/https://pasta.lternet.edu/package/metadata/eml/edi/359/1</a>.</p>
<p>Do we want to tweak the indexer to support this or is it actually a good thing that the indexer didn't pick this up because it's, subjectively, not well-formed EML?</p>
Infrastructure - Task #8775 (In Progress): Make taxonomic rank fields in Solr index non-case-sens...https://redmine.dataone.org/issues/87752019-03-11T22:39:44ZBryce Mecummecum@nceas.ucsb.edu
<p>In the current system, EML documents with taxonomic coverage get indexed into fields such as <code>species</code> if they contain XML such as:</p>
<pre>...snip
<taxonomicCoverage>
<taxonomicClassification>
<taxonRankName>Species</taxonRankName>
<taxonRankValue>Some species</taxonRankValue>
...snip
</pre>
<p>The field values are extracted using the XPath in In <a href="https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/main/resources/application-context-eml-base.xml:">https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/main/resources/application-context-eml-base.xml:</a></p>
<pre>//taxonomicClassification/taxonRankValue[../taxonRankName="Species"]/text()
</pre>
<p>We ran into a case where the <code>taxonRankName</code> had been entered as 'species' instead of 'Species' and we decided that the XPath is too restrictive and that the strictness is needless and surprising. This change should result in a slight but negligible decrease in performance.</p>
<ul>
<li>Change all EML taxonomy fields to also match the lowercase form of each taxonomic rank</li>
<li>Check over other indexing field definitions related to taxonomy to make sure the above change is consistent</li>
</ul>
Infrastructure - Decision #8765 (Closed): Consider changing how BaseSolrFieldXPathTest workshttps://redmine.dataone.org/issues/87652019-02-13T00:36:55ZBryce Mecummecum@nceas.ucsb.edu
<p>Ran into a weird thing while expanding the <a href="https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/test/java/org/dataone/cn/index/SolrFieldXPathEmlTest.java">https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/test/java/org/dataone/cn/index/SolrFieldXPathEmlTest.java</a> to test an EML 2.2.0 doc.</p>
<p><code>SolrFieldXPathEmlTest</code> compares values extracted via various subprocessors to a set of expectations stored in a <code>HashMap<String, String>()</code> (of form <code><fieldName, expectedValue></code>). This data structured limits expectations to one value per field. Forever ago, in <code>r7985</code>,</p>
<pre>r7985 | sroseboo | 2012-03-23 12:12:53 -0800 (Fri, 23 Mar 2012) | 1 line
initial commit of search index support for parsing FGDC science metadata docs.
Index: src/test/java/org/dataone/cn/index/BaseSolrFieldXPathTest.java
</pre>
<p>support was added for testing multiple expectations for a single field by defining a convention of smushing multiple values into a single string, separated by two # characters (##). For example,</p>
<pre>eml210Expected.put("project", "Random Project Title##Another Random Project Title");
</pre>
<p>would test the <code>project</code> field for two values, "Random Project Title" and "Another Project Title" not a literal "Random Project Title##Another Random Project Title". I imagine this was picked because it's rare to see a ## in a metadata record which seems reasonable.</p>
<p>I ran afoul of this today because I wanted to test an expectation for a field with a # in it and I couldn't because it was being split when I didn't want it to be. Why did a single # break things when the convention above is a double #? Because <code>BaseSolrFieldXPathTest.java</code> splits the expectation string using <code>StringUtils.split</code> like this:</p>
<pre>StringUtils.split(expectedForField, "##") // Where expectedForField might equal "Random Project Title##Another Random Project Title"
</pre>
<p>According to <a href="https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split-java.lang.String-java.lang.String-">https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split-java.lang.String-java.lang.String-</a>, the second arg to <code>StringUtils.split</code>, <code>separatorChars</code>, is "the characters used as the delimiters, null splits on whitespace" which tells me we're using it wrong. I think the method we needed to use was <code>StringUtils.splitByWholeSeparator</code> which correctly splits only on double #.</p>
<p>I could just change the line of code and move on but that breaks a ton (98) of tests. Before I did that, I wanted to ask what others thought. I see a few routes:</p>
<ol>
<li>Change the code to correctly split only on ## and not # <em>and</em> update all the tests I break</li>
<li>Do 1 <em>and</em> use a different separator to something more future proof. I suggest "&&" because that'd be invalid in XML outside a CDATA section.</li>
<li>Change how the expectations get tested so field expectations can have a one to many relationship. This'd take some time so I'd opt for (1) or (2) instead</li>
</ol>
<p>Any preferences out there?</p>
Infrastructure - Decision #8693 (In Progress): Support Google Dataset Search on search.dataone.or...https://redmine.dataone.org/issues/86932018-09-07T00:16:59ZBryce Mecummecum@nceas.ucsb.edu
<a name="Background"></a>
<h2 >Background<a href="#Background" class="wiki-anchor">¶</a></h2>
<p>Yesterday, <a href="https://toolbox.google.com/datasetsearch" class="external">Google Dataset Search</a> launched. We previoiusly attempted to make MetacatUI (and by extension, DataONE Search) compatible with it by <a href="https://github.com/NCEAS/metacatui/issues/482" class="external">injecting Schema.org JSON-LD into appropriate pages</a>. During development and testing, we checked our compatibility with the upcoming Google Dataset Search using Google's <a href="https://search.google.com/structured-data/testing-tool" class="external">Structured Data Testing Tool</a>. During development, this was all working fine and the feature appeared to be compatible but, after launching the feature on search.dataone.org, behavior changed on Google's end making it so Google no longer saw this JSON-LD. The reason for this is likely that, because MetacatUI follows a single page application architecture and we inject the JSON-LD on the client side, Google's JSON-LD crawler only saw what was sent from the server (a nearly empty index.html) and not our full page (with JSON-LD). I was able to test this theory and, while Google's crawler does execute JavaScript, it limits execution to about or exactly five seconds and MetacatUI <em>usually</em> doesn't finish injecting JSON-LD and rendering all content until after that timeout.</p>
<p>Potential paths forward to get DataONE Search compatible with Google's Dataset Search include (none of which are mutually exclusive):</p>
<ol>
<li>The assets that make up MetacatUI and the asset loading strategies could be optimized: <a href="https://github.com/NCEAS/metacatui/issues/224">https://github.com/NCEAS/metacatui/issues/224</a></li>
<li>Move the code (and any dependencies) that injects JSON-LD further up in the app boot so that Google sees it</li>
<li>Inject the appropriate JSON-LD on the server side to guarantee that Google sees it (originally Matt Jones' idea!)</li>
</ol>
<p>(1) is being worked on for sure, and (2) may not be needed if (1) is successful. I want to talk about option (3) because:</p>
<ul>
<li>It's a quicker solution (I already have something working) which would help get us involved in the project faster</li>
<li>It paves the way for future features and/or improvements to MetacatUI (we could be rendering more on the server side than just JSON-LD, like other metadata, more page content, etc)</li>
</ul>
<a name="What-I-did"></a>
<h2 >What I did<a href="#What-I-did" class="wiki-anchor">¶</a></h2>
<p>To test this idea, I modified a <a href="https://github.com/amoeba/backbone-pushstate-example" class="external">previous project</a> which is just a simple Node (Express.js) app that hosts MetacatUI by intercepting every request and serving the appropriate asset. In injects Schema.org JSON-LD, when appropriate, by querying the CN Solr index before sending MetacatUI's index.html to the client. <a href="https://github.com/amoeba/metacatui-ssr" class="external">Code is here</a> and its deployed <a href="http://neutral-cat.nceas.ucsb.edu/" class="external">here</a>. View source on any /view/... pages and you'll see a minimal Schema.org/Dataset description in the head. More properties can be added later. I did it quick and dirty: The app pre-loads MetacatUI's index.html as a <code>String</code> at app boot and injects the JSON-LD into it. No templating language or other magic.</p>
<a name="Things-to-address"></a>
<h2 >Things to address<a href="#Things-to-address" class="wiki-anchor">¶</a></h2>
<ul>
<li>How do we feel abouts switching from hosting MetacatUI via Apache (simple, bullet proof) to a Node based deployment just to support this feature (new territory, at least for me)?</li>
<li>If we do switch, we'd want to make really sure the Node app doesn't have weird failure cases where it doesn't return index.html (e.g., when Solr is down, or slow). The app needs to return index.html (and every other static asset) on every request and do it very fast and we should decide what the cutoff is so that it doesn't hold up app boot if Solr is slow/down.</li>
<li>Can this type of deployment easily be integrated with CN buildouts? I've deployed Node apps before by fronting them with Apache/nginx (via reverse proxy) and then keeping the node process up with Upstart</li>
<li>Is this performant enough for DataONE? I think my implementation is non-blocking but I'm not a Node expert so we'd want to code review and probably benchmark </li>
<li>We could wait on (1) and stick with our current deployment strategy</li>
</ul>
<a name="Other-notes"></a>
<h2 >Other notes<a href="#Other-notes" class="wiki-anchor">¶</a></h2>
<p>Unrelated to the Google Dataset Search issue but related to Google's crawling for Google Search, we've also identified:</p>
<ul>
<li>That the Metacat View Service is often unreasonably slow: <a href="https://github.com/NCEAS/metacat/issues/1234">https://github.com/NCEAS/metacat/issues/1234</a> and are planning to figure out why</li>
<li>That we can and should make use of sitemaps to help Google crawl our pages: <a href="https://github.com/NCEAS/metacat/issues/1263">https://github.com/NCEAS/metacat/issues/1263</a></li>
</ul>
Infrastructure - Bug #8629 (Closed): unable to find valid certificate path to requested target wh...https://redmine.dataone.org/issues/86292018-06-25T20:34:40ZBryce Mecummecum@nceas.ucsb.edu
<p>This bug came from Mark Schildhauer and Margaret O'Brien.</p>
<p>While using Protege to import <a href="https://purl.dataone.org/obo/ENVO_import.owl">https://purl.dataone.org/obo/ENVO_import.owl</a>, the following error pops up:</p>
<pre>sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Full Stack Trace
-----------------------------------------------------------------------------------------
org.semanticweb.owlapi.io.OWLOntologyCreationIOException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:207)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.actualParse(OWLOntologyManagerImpl.java:1099)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntology(OWLOntologyManagerImpl.java:1055)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyManagerImpl.loadOntologyFromOntologyDocument(OWLOntologyManagerImpl.java:1011)
at org.protege.editor.owl.model.io.OntologyLoader.loadOntologyInternal(OntologyLoader.java:101)
at org.protege.editor.owl.model.io.OntologyLoader.lambda$loadOntologyInOtherThread$210(OntologyLoader.java:60)
at org.protege.editor.owl.model.io.OntologyLoader$$Lambda$102/1971532877.call(Unknown Source)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:422)
at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1889)
at sun.net.www.protocol.http.HttpURLConnection$10.run(HttpURLConnection.java:1884)
at java.security.AccessController.doPrivileged(Native Method)
at sun.net.www.protocol.http.HttpURLConnection.getChainedException(HttpURLConnection.java:1883)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1456)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:254)
at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStreamFromContentEncoding(AbstractOWLParser.java:165)
at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStream(AbstractOWLParser.java:127)
at org.semanticweb.owlapi.io.AbstractOWLParser.getInputSource(AbstractOWLParser.java:232)
at org.semanticweb.owlapi.rdf.rdfxml.parser.RDFXMLParser.parse(RDFXMLParser.java:72)
at uk.ac.manchester.cs.owl.owlapi.OWLOntologyFactoryImpl.loadOWLOntology(OWLOntologyFactoryImpl.java:197)
... 10 more
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.ssl.Alerts.getSSLException(Alerts.java:192)
at sun.security.ssl.SSLSocketImpl.fatal(SSLSocketImpl.java:1937)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:302)
at sun.security.ssl.Handshaker.fatalSE(Handshaker.java:296)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1478)
at sun.security.ssl.ClientHandshaker.processMessage(ClientHandshaker.java:212)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:969)
at sun.security.ssl.Handshaker.process_record(Handshaker.java:904)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1050)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1363)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1512)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1440)
at sun.net.www.protocol.http.HttpURLConnection.getHeaderField(HttpURLConnection.java:2942)
at java.net.URLConnection.getContentEncoding(URLConnection.java:523)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getContentEncoding(HttpsURLConnectionImpl.java:410)
at org.semanticweb.owlapi.io.AbstractOWLParser.getInputStream(AbstractOWLParser.java:122)
... 13 more
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:387)
at sun.security.validator.PKIXValidator.engineValidate(PKIXValidator.java:292)
at sun.security.validator.Validator.validate(Validator.java:260)
at sun.security.ssl.X509TrustManagerImpl.validate(X509TrustManagerImpl.java:324)
at sun.security.ssl.X509TrustManagerImpl.checkTrusted(X509TrustManagerImpl.java:229)
at sun.security.ssl.X509TrustManagerImpl.checkServerTrusted(X509TrustManagerImpl.java:124)
at sun.security.ssl.ClientHandshaker.serverCertificate(ClientHandshaker.java:1460)
... 28 more
Caused by: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
at sun.security.provider.certpath.SunCertPathBuilder.build(SunCertPathBuilder.java:145)
at sun.security.provider.certpath.SunCertPathBuilder.engineBuild(SunCertPathBuilder.java:131)
at java.security.cert.CertPathBuilder.build(CertPathBuilder.java:280)
at sun.security.validator.PKIXValidator.doBuild(PKIXValidator.java:382)
... 34 more
</pre>
<p>To reproduce:</p>
<ul>
<li>Open Protege</li>
<li>Open from URL</li>
<li>Paste and open '<a href="https://purl.dataone.org/obo/ENVO_import.owl">https://purl.dataone.org/obo/ENVO_import.owl</a>'</li>
<li>See the stack trace</li>
</ul>
<p>That PURL link redirects to a GitHub raw URL which <em>does not</em> reproduce this error. The version of Protege I'm using makes use of its own version of Java:</p>
<pre>❯ /Applications/Protégé.app/Contents/Plugins/JRE/Contents/Home/jre/bin/java -version
java version "1.8.0_40"
Java(TM) SE Runtime Environment (build 1.8.0_40-b27)
Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
</pre>
<p>A quick Google reveals it could be because Java isn't getting enough of the certificate chain back from the web server but quick run of <a href="https://www.ssllabs.com/ssltest/analyze.html?d=purl.dataone.org">https://www.ssllabs.com/ssltest/analyze.html?d=purl.dataone.org</a> makes everything look in order.</p>
<p>Any ideas?</p>
Infrastructure - Bug #8615 (Closed): isotc211 indexing component has the wrong XPath for the pubD...https://redmine.dataone.org/issues/86152018-06-14T00:14:32ZBryce Mecummecum@nceas.ucsb.edu
<p>The latest trunk isotc211 indexing component has the following XPATH for <code>pubDate</code>:</p>
<pre><bean id="isotc.pubDate" class="org.dataone.cn.indexer.parser.SolrField">
<constructor-arg name="name" value="pubDate"/>
<constructor-arg name="xpath" value="(//gmd:dateStamp/gco:Date/text() | //gmd:dateStamp/gco:DateTime/text())[1]"/>
<property name="converter" ref="dateConverter"/>
</bean>
</pre>
<p>This doesn't make any sense and probably wasn't vetted when it went into version control. If you look at an example document, like one from PANGAEA, you see this is how they describe the publication date:</p>
<pre><ns0:identificationInfo>
<ns0:MD_DataIdentification>
<ns0:citation>
<ns0:CI_Citation>
<ns0:title>
<ns2:CharacterString>Simulated ocean velocity at 420 m water depth for pre-industrial, glacial, Pliocene, and Miocene climate states</ns2:CharacterString>
</ns0:title>
<ns0:date>
<ns0:CI_Date>
<ns0:date>
<ns2:DateTime>2018-04-25T15:20:13</ns2:DateTime>
</ns0:date>
<ns0:dateType>
<ns0:CI_DateTypeCode codeList="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="http://www.isotc211.org/2005/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode_publication">publication</ns0:CI_DateTypeCode>
</ns0:dateType>
</ns0:CI_Date>
</ns0:date>
</pre>
<p>I think it'd be good if the XPath pulled out the first <code>date</code> from the <code>identificationInfo/citation</code>, preferring one of <code>dateType</code> <code>publicationDate</code> and falling back to any <code>date</code> that's in the <code>identificationInfo/citation</code>.</p>
Infrastructure - Bug #8612 (Closed): Improperly formatted Alternate Data Access URLshttps://redmine.dataone.org/issues/86122018-06-12T22:37:39ZBryce Mecummecum@nceas.ucsb.edu
<p>If I go to</p>
<p><a href="https://search.dataone.org/#view/urn:uuid:8d639e70-55eb-40aa-b1a4-29712fd31b63">https://search.dataone.org/#view/urn:uuid:8d639e70-55eb-40aa-b1a4-29712fd31b63</a></p>
<p>and look at he underlying URL in the Address field, I see:</p>
<p><a href="https://search.dataone.org/%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20http://data.eol.ucar.edu/codiac/dss/id=102.036%0A">https://search.dataone.org/%0A%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20%20http://data.eol.ucar.edu/codiac/dss/id=102.036%0A</a></p>
<p>which should be </p>
<p><a href="http://data.eol.ucar.edu/codiac/dss/id=102.036%0A">http://data.eol.ucar.edu/codiac/dss/id=102.036%0A</a></p>
<p>Looks like the underlying XSLT isn't going deep enough down the path hierarchy</p>
Member Nodes - Support #8209 (Closed): Five objects failed to sync to the CN from urn:node:ARCTIChttps://redmine.dataone.org/issues/82092017-10-25T18:28:11ZBryce Mecummecum@nceas.ucsb.edu
<p>I noticed this while doing other work and wanted to report the Objects in question so they can get fixed.</p>
<p>doi:10.18739/A2T091<br>
doi:10.18739/A2FV7G<br>
doi:10.18739/A28Z56<br>
doi:10.18739/A2Z55G<br>
doi:10.18739/A2PP08</p>
<p>Calls to CNRead.get() and CNRead.resolve() are giving me error that the System Metadata for each Object can't be found, e.g.,</p>
<p>No system metadata could be found for given PID: doi:10.18739/A2T091"</p>
Infrastructure - Task #8165 (Closed): Re-factor origin field in isotc211 indexing componenthttps://redmine.dataone.org/issues/81652017-08-29T22:40:30ZBryce Mecummecum@nceas.ucsb.edu
<p>The xpath selectors we use in the origin field in the isotc211 indexing component (bean?) were found to be incorrect for a particular use case and a larger group of us agreed that the usage was incorrect. We should re-visit which xpaths are being used, re-deploy, and re-index the affected content.</p>
<p>I'm pasting in an email chain that initiated the creation of this Issue so we have the full background:</p>
<p>From Chris Turner at Axiom</p>
<blockquote>
<p>Hi Laura and Matt,</p>
<p>Since the launch of the Research Workspace member node, we've no noticed that the dataset citation given at the top of the page doesn't match how we or the PIs would like the official citations to be formatted. There are two issues: selection of contact names for the citation, and appearance of the DOI. </p>
<p>It looks like contact names are being pulled from several parts of the metadata record, the section describing the resource itself (gmd:MD_DataIdentification/gmd:citation/...) and from the section describing associated or aggregated resources (gmd:MD_DataIdentification/gmd:aggregationInfo/...). Here are two examples:</p>
<p>Mary Anne Bishop, Ben Gray, and Scott Pegau. 2017. Fish Predation on Juvenile Herring in Prince William Sound, Alaska, 2009-2012, EVOS Prince William Sound Herring Program. Research Workspace. 10.24431/rw1k1z.</p>
<p>Mary Anne Bishop, Anne Schaefer, Kathy Kuletz, Molly McCammon, Katrina Hoffman, et al. 2017. Fall and Winter Seabird Abundance Data, Prince William Sound, 2007-2017, Gulf Watch Alaska Pelagic Component. Research Workspace. 10.24431/rw1k1w.</p>
<p>In the first exmaple, Scott Pegau is listed in the dataset citation, though in the metadata he is not connected to the dataset but to as the PI for the Herring Program, a larger work referenced in the 'aggregationInfo' section. The second example is the same - McCammon, Hoffman, et al. are pulled from the 'aggregationInfo' element. </p>
<p>Please let me know if I understand correctly how the contacts are being selected for the citation. If I do have it right, is there anything that we can do about it, on our end or in the member node? </p>
<p>The DOI display issue is simpler. DataCite and CrossRef best practices are that DOIs should be displayed as complete URLs, with '<a href="https://doi.org/">https://doi.org/</a>' appearing before the DOI code assigned to a resource. That's how they're formatted in the metadata records, but the URL-esque formatting is stripped out for the citation and for display in the member node.</p>
<p>Can we display the DOI, both in the citation and in the metadata page as a full link?</p>
<p>Please advise on the best way to remedy these. I apologize if this is not the correct venue for this conversation. Please let me know if it makes more sense to continue talking about this on Slack.</p>
<p>Thanks in advance for your help.</p>
<ul>
<li>Chris</li>
</ul>
</blockquote>
<p>From Matt Jones:</p>
<blockquote>
<p>Bryce worked on a new stylesheet for ISO metadata, and so that is scheduled to be released soon. So, any changes would be good to propose even sooner to get them folded in. </p>
<p>If I am interpreting Chris correctly, he's saying that we are indiscriminately pulling Responsible_Party entries regardless of their context in the document, and that it would be best practice to only cite the ResponsibleParty instances that are part of the citatiion in gmd:MD_DataIdentification/gmd:citation/. This makes total sense to me, and is what we do with other metadata standards. So, we need to look into what is causing the behavior, and figure out if it is a stylesheet change, an indexing change, or both. </p>
<p>Matt</p>
</blockquote>
<p>From Chris Turner:</p>
<blockquote>
<p>Hi all, <br>
Matt's understanding is correct. As it is now. ResponsibleParty elements are being pulled in independent of where they appear in the record. Doing as he says and pulling only from gmd:MD_DataIdentification/gmd:citation would go be a good simple fix. We'd also like the pull the contact name from gmd:MD_DataIdentification/gmd:pointOfContact, too.<br>
I'm available the September 5th-8th to chat if we need to talk that week. </p>
<p>From Laura Moyers:</p>
<p>Thanks, Chris! </p>
<p>Matt, do you think what Chris describes would be a stylesheet change that Bryce could incorporate into his current changes? Would we need to talk with other ISO metadata users about this?</p>
<p>Thanks<br>
Laura</p>
</blockquote>
<p>From Matt Jones:</p>
<blockquote>
<p>I suspect it should be straightforward, and Bryce may have already handled it. We were just talking yesterday about getting his stylesheet changes into a Metacat release and deployed on the CN -- its been in a holding pattern for some reason. So, Jing and Bryce are going to work on getting those display improvements pushed out, as they were requested by a number of members. I don't think Chris' proposal would be at all controversial -- our current display is clearly misleading and would be improved, so I think its a clear win. So. I'll cc Bryce and hopefully he can comment on whether and what work would be needed to support proper citation displays for ISO records.</p>
<p>Matt</p>
</blockquote>
<p>From Rob Nahf:</p>
<blockquote>
<p>Will these discussed changes also be reflected in the DataONE solr index? If so, we would likely need to reindex all content of that format after making changes to the parser.<br><br>
(It sounds like there's broad community support for the change, so reindexing probably wouldn't negatively impact anyone...)</p>
</blockquote>
<p>From Bryce Mecum:</p>
<blockquote>
<p>Hey all: Yes, as Matt guesses this is pretty straight forward to fix. The dataset citations in our search and landing pages are powered by our Solr index and we would just need to change the relevant indexing routine and reindex the content. After a quick look at how things are working now, I agree that some change is needed. The information contained in this thread is super helpful so thanks, Chris, for the high level of detail in your original email.</p>
<p>All of that work is on our end and I can coordinate with the DataONE CI team on the changes and we'll let everyone here know when the changes have been made.</p>
</blockquote>
<p>The current set of XPaths can be found at <a href="https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/main/resources/application-context-isotc211-base.xml">https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/main/resources/application-context-isotc211-base.xml</a> which, at the time of writing this, have this bean set for the origin field:<br>
<br>
<br>
<br>
<br>
<br>
<br>
</p>
<p>The sub-tasks here are:</p>
<ul>
<li>Figure out what <em>should</em> go in there instead</li>
<li>Probably consult some folks for confirmation</li>
<li>Update the Bean</li>
<li>Re-index affected documents once change has been reployed</li>
</ul>
Infrastructure - Task #7466 (In Progress): Some objects not accessible on the CN via REST APIhttps://redmine.dataone.org/issues/74662015-11-04T18:41:38ZBryce Mecummecum@nceas.ucsb.edu
<p>While doing other work, I noticed that a good number (not sure how many) of objects listed on the CN's Solr index are not accessible via the REST API get() and resolve() methods. Instead of returning the object, they return a NotFound error. </p>
<p>To reproduce,</p>
<ol>
<li>Visit <a href="https://cn.dataone.org/cn/v1/query/solr/?fl=identifier,title,authoritativeMN,datasource&q=formatType:METADATA+AND+-obsoletedBy:*&rows=100&start=0">https://cn.dataone.org/cn/v1/query/solr/?fl=identifier,title,authoritativeMN,datasource&q=formatType:METADATA+AND+-obsoletedBy:*&rows=100&start=0</a></li>
<li>Pick a PID from the query result, e.g.</li>
</ol>
<ul>
<li>knb-lter-cap.148.9</li>
<li>CLOEBDMETADATA.10242013.1</li>
</ul>
<ol>
<li>Attempt to resolve() or get() the object via the REST API like: <a href="https://cn.dataone.org/cn/v1/object/CLOEBDMETADATA.10242013.1">https://cn.dataone.org/cn/v1/object/CLOEBDMETADATA.10242013.1</a></li>
<li>Receive a NotFound error instead of the object.</li>
</ol>
<p>Notes:</p>
<p>In IRC, Skye noticed that the objects can be retrieved via their respective MN so it appears this issue may be a Metacat replication issue.</p>