https://redmine.dataone.org/https://redmine.dataone.org/favicon.ico2014-09-18T17:30:21ZDataONE TasksInfrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=213972014-09-18T17:30:21ZDave Vieglaisdave.vieglais@gmail.com
<ul></ul><p>RFC 7303 Sections 8.8 and 8.9 are particularly relevant here. <a href="http://www.rfc-editor.org/rfc/rfc7303.txt">http://www.rfc-editor.org/rfc/rfc7303.txt</a></p>
<p>HTTP headers from the cn: </p>
<p>Content-Type: text/xml;charset=UTF-8</p>
<p>and from the mn: </p>
<p>Content-Type: text/xml</p>
<p>Both the MN and the CN are operating incorrectly, though the CN is worse.</p>
<p>The MN SHOULD be setting the charset parameter when transmitting over HTTP.</p>
<p>The CN output is an example of incorrect encoding of output. In that case, the charset parameter of the HTTP header specifies UTF-8, but the encoding parameter of the XML document indicates ISO-8859-1. HTTP stream readers will likely interpret the content as UTF-8, though if the file is saved without a BOM, then an editor will recognize the file as ISO-8859-1.</p>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=214012014-09-18T19:10:53ZRobert Waltz
<ul></ul><p>In the web.xml of the DataONE_CN_Rest project there is a filter applied to every request named CharacterEncodingFilter that forces all encoding to be UTF-8. We can either take the filter off, or make its application more selective.</p>
<p>So, if this bug is still seen to apply to Metacat, then we should make a second bug report for cn_rest_service as well.</p>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=215222014-09-24T18:10:32ZSkye Roseboomsroseboo@dataone.unm.edu
<ul><li><strong>Due date</strong> set to <i>2014-09-24</i></li><li><strong>Start date</strong> set to <i>2014-09-24</i></li><li><strong>Target version</strong> set to <i>CCI-1.5.0</i></li></ul> Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=216922014-09-25T16:57:16ZDave Vieglaisdave.vieglais@gmail.com
<ul><li><strong>Due date</strong> changed from <i>2014-09-24</i> to <i>2014-09-25</i></li></ul><p>Conclusion after examining literature and the requirements of the get() operation:</p>
<ul>
<li><p>The server should follow the recommendations of RFC7303 and related operational specifications such as RFC6838 and RFC7231</p></li>
<li><p>If the client is making an exact copy (e.g. during CN sync of metadata, and for replicas), the client MUST stream the exact bytes of the HTTP GET body without alteration</p></li>
<li><p>The client SHOULD make accessible any MIME and other descriptive information provided by the origin</p></li>
<li><p>The CNs SHOULD record the MIME type and other descriptive information about the content so that this information can be passed on to clients.</p></li>
</ul>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=228042014-10-29T00:04:49ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Due date</strong> changed from <i>2014-09-25</i> to <i>2014-10-29</i></li></ul><p>MN seems ok. Please see 8.3 <a href="http://www.rfc-editor.org/rfc/rfc7303.txt">http://www.rfc-editor.org/rfc/rfc7303.txt</a>.</p>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=228132014-10-29T18:33:20ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>First step is to implement this:<br>
If the client is making an exact copy (e.g. during CN sync of metadata, and for replicas), the client MUST stream the exact bytes of the HTTP GET body without alteration</p>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=228142014-10-29T20:09:41ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>in the read actions (such as get and getReplica), bytes are read directly from files. So there is no alteration. </p>
<p>However, in the save actions (such as create and update), there is transform from an input stream to a string by using default utf-8 encoding:</p>
<p>CNode.create(inputStream) ---> D1Node.create(inputStream) ---> D1Node.insertOrUpdateDocument(xmlString)--->MetacatHandler.handleInsertOrUpdateAction( xmlString). The inputStream is transformed to a string by use UTF-8 encoding.</p>
<p>MNode.create(inputStream) ---> D1Node.create(inputStream) ---> D1Node.insertOrUpdateDocument(xmlString)--->MetacatHandler.handleInsertOrUpdateAction( xmlString). The inputStream is transformed to a string by use UTF-8 encoding.</p>
<p>MNode.update(inputStream) ---> D1Node.insertOrUpdateDocument(xmlString)--->MetacatHandler.handleInsertOrUpdateAction( xmlString). The inputStream is transformed to a string by use UTF-8 encoding.</p>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=228872014-11-03T18:19:16ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Due date</strong> changed from <i>2014-10-29</i> to <i>2014-11-03</i></li></ul><p>During the replication in Metacats, Metacat change the input stream from another host to a string using the default encoding, then save the string to the file. We need to save the input stream directly.</p>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=228942014-11-04T18:51:43ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Due date</strong> changed from <i>2014-11-03</i> to <i>2014-11-04</i></li><li><strong>Status</strong> changed from <i>New</i> to <i>In Progress</i></li></ul> Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=229782014-11-13T18:38:36ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Status</strong> changed from <i>In Progress</i> to <i>Testing</i></li><li><strong>Due date</strong> changed from <i>2014-11-04</i> to <i>2014-11-13</i></li></ul> Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=229952014-11-14T00:03:49ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Due date</strong> changed from <i>2014-11-13</i> to <i>2014-11-14</i></li><li><strong>Target version</strong> changed from <i>CCI-1.5.0</i> to <i>CCI-1.5.1</i></li></ul> Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=229962014-11-14T00:04:28ZJing Taotao@nceas.ucsb.edu
<ul></ul><p>The features will be released in cci-1.5.0 is:</p>
<p><a href="https://redmine.dataone.org/issues/6568">https://redmine.dataone.org/issues/6568</a></p>
Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=233352015-01-06T18:41:56ZDave Vieglaisdave.vieglais@gmail.com
<ul><li><strong>Due date</strong> changed from <i>2014-11-14</i> to <i>2015-01-06</i></li><li><strong>Target version</strong> changed from <i>CCI-1.5.1</i> to <i>CCI-2.0.0</i></li></ul> Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=249142015-04-14T20:49:00ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Related to</strong> <i><a class="issue tracker-5 status-5 priority-4 priority-default closed child" href="/issues/7042">Task #7042</a>: Create an element for the character set encoding in the system metadata schema</i> added</li></ul> Infrastructure - Bug #6391: Science metadata files with different checksums on CN and MN - encodinghttps://redmine.dataone.org/issues/6391?journal_id=249152015-04-14T20:50:49ZJing Taotao@nceas.ucsb.edu
<ul><li><strong>Status</strong> changed from <i>Testing</i> to <i>Closed</i></li><li><strong>% Done</strong> changed from <i>0</i> to <i>100</i></li></ul><p>I created a new ticket to generate an option element for the character set encoding in the system metadata.<br>
<a href="https://redmine.dataone.org/issues/7042">https://redmine.dataone.org/issues/7042</a><br>
close the ticket.</p>