https://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2019-03-21T20:39:11ZDataONE TasksInfrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=312962019-03-21T20:39:11ZRoger Dahldahl@unm.edu
<ul></ul><p>Example:</p>
<p><a href="https://search.dataone.org/view/http://get.iedadata.org/metadata/iso/609441">https://search.dataone.org/view/http://get.iedadata.org/metadata/iso/609441</a></p>
<p>Publication date in the metadata as 2010, but the <code>pubDate</code> value in Solr is <code>2018-05-17T00:00:00Z</code></p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=313012019-04-01T22:10:27ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>The xpath looks like:<br>
<code>//gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:Date/text() <br>
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:Date[1]/text() <br>
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:DateTime/text() <br>
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:DateTime[1]/text()</code></p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=313092019-04-15T21:17:56ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li><li><strong>Assignee</strong> set to <i>Jing Tao</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>Closed</i></li></ul> Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314102019-06-01T06:22:12ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>% Done</strong> changed from <i>100</i> to <i>30</i></li><li><strong>Status</strong> changed from <i>Closed</i> to <i>In Progress</i></li></ul> Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314112019-06-01T06:25:07ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>ISO fragment:</p>
<pre><code class="<gmd:identificationInfo> syntaxhl"> <gmd:MD_DataIdentification>
<gmd:citation>
<gmd:CI_Citation>
<gmd:title>
<gco:CharacterString>Steller sea lion (Eumetopias jubatus) satellite telemetry data used to determine at-sea distribution in the western-central Aleutian Islands, Alaska 2000-2013</gco:CharacterString>
</gmd:title>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2013-01-01</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="creation">creation</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
<gmd:date>
<gmd:CI_Date>
<gmd:date>
<gco:Date>2019-06-01</gco:Date>
</gmd:date>
<gmd:dateType>
<gmd:CI_DateTypeCode codeList="http://www.ngdc.noaa.gov/metadata/published/xsd/schema/resources/ Codelist/gmxCodelists.xml#CI_DateTypeCode" codeListValue="publication">publication</gmd:CI_DateTypeCode>
</gmd:dateType>
</gmd:CI_Date>
</gmd:date>
</code></pre>
<p>Our processor will get the first date time 2013-0-01.</p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314122019-06-01T06:30:05ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>Our rules look like:<br>
<code><br>
/gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:Date/text() <br>
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:Date[1]/text() <br>
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:DateTime/text() <br>
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:DateTime[1]/text()<br>
</code><br>
If I removed the general one - <br>
<code><br>
//gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:Date[1]/text()</code><br>
We can get the correct result. So the processor doesn't apply the xpath by the order?</p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314132019-06-03T22:25:01ZBryce Mecummecum@nceas.ucsb.edu
<ul></ul><p>Hey Jing, is the XPath in your above comment the one in the source? It looks like it has a bug (first element is missing a seoncd <code>/</code>) and also duplicates the XPaths.</p>
<p>Should it be:</p>
<pre>//gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:DateTime/text()
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:DateTime[1]/text()
</pre> Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314142019-06-03T22:42:59ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>Hi Bryce:<br>
The missing <code>/</code> is a typo on the comment. The code doesn't miss it. Sorry to confuse you.</p>
<p>The operator <code>|</code> is to compute two node sets:</p>
<p><a href="https://www.w3schools.com/xml/xpath_operators.asp">https://www.w3schools.com/xml/xpath_operators.asp</a></p>
<p>So we should use:</p>
<pre>//gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:Date/text()
|//gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:DateTime/text()
</pre>
<p>The above two xpath are not duplicated - the last elements are different. </p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314152019-06-03T22:54:30ZBryce Mecummecum@nceas.ucsb.edu
<ul></ul><p>Gotcha, my mistake. The fallback XPath still seems desirable though. What if we just ordered them differently?</p>
<pre>//gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:Date/text()
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date[following-sibling::gmd:dateType/gmd:CI_DateTypeCode/text() = 'publication']/gco:DateTime/text()
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:Date[1]/text()
| //gmd:identificationInfo/*/gmd:citation/gmd:CI_Citation/gmd:date/gmd:CI_Date/gmd:date/gco:DateTime[1]/text()
</pre> Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314162019-06-03T23:02:05ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>To my understand, the operator <code>|</code> is not <code>or</code>. The xpath will get all values of the all existing path. In Roger's case, the xml object has both xpath (fallback one and publication one). So it will get two values. Since the fallback's position in the xml instance is prior to the publication one, the fallback one was selected. <br>
I am not sure how can we keep the fallback path.</p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314172019-06-03T23:26:38ZBryce Mecummecum@nceas.ucsb.edu
<ul></ul><p>You are right, <code>|</code> is not <code>\or</code>. </p>
<p>I was thinking that re-arranging the order would any nodeset returned by the XPath ordered from most desirable to least desirable. After a quick test with an example doc I made up it does seem like the <code>|</code> doesn't return an ordered nodeset which was my hope. Or if it is ordered, it's not ordered in the same order as the XPath.</p>
<p>From the help <a href="https://stackoverflow.com/questions/5497197/how-to-get-the-real-node-order-from-xpath-expression-java">https://stackoverflow.com/questions/5497197/how-to-get-the-real-node-order-from-xpath-expression-java</a> it looks like the XPath spec defines nodesets as unordered and implementations of the spec may choose to allow clients to enforce an order. Do you know if we can do that here?</p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314182019-06-03T23:29:28ZBryce Mecummecum@nceas.ucsb.edu
<ul></ul><p>Looking at that a bit more, maybe what's really going on is that the nodeset is returned in document order rather than XPath order which matches what I'm seeing after doing some testing.</p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314192019-06-03T23:32:20ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>Yes, I think they are returned in the document order.</p>
Infrastructure - Bug #8615: isotc211 indexing component has the wrong XPath for the pubDate fieldhttps://redmine.dataone.org/issues/8615?journal_id=314692019-07-02T22:22:25ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>30</i> to <i>100</i></li></ul><p>Now we used Saxon to support xpath 2.0 and fixed the issue.</p>