Project

General

Profile

Task #8121

Fix MIME type and formatId for KML files

Added by Matthew Jones almost 7 years ago. Updated almost 7 years ago.

Status:
Closed
Priority:
High
Assignee:
Category:
dataone-cn-metacat
Target version:
Start date:
2017-06-30
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

The Arctic Data Center data team is trying to upload some KML files, and noticed an oddity in the formatId and mime type for KML files. The proper MIME type should be application/vnd.google-earth.kml+xml (see https://developers.google.com/kml/documentation/kml_tut#kml_server), but the DataONE formatId and mime type are set to application/vnd.google-earth.kml xml, where a space was substituted for the + sign.

At a minimum, the MIME type should be corrected in our formats list (https://cn.dataone.org/cn/v2/formats), but I think the formatId itself should also be corrected. It appears there are only five objects using this formatId in production: https://cn.dataone.org/cn/v2/object?formatId=application/vnd.google-earth.kml%20xml , so I think we could change the formatId in our vocabulary, and work with the owners of those 5 objects to update their formatId. Thoughts?


Related issues

Related to Infrastructure - Task #8128: Correct invalid formatIDs in production Closed 2017-07-11
Blocked by Infrastructure - Bug #8122: Metacat is double-decoding incoming urls on the CNs Closed 2017-06-30

History

#1 Updated by Dave Vieglais almost 7 years ago

  • Target version set to CCI-2.3.6
  • Priority changed from Normal to High
  • Assignee set to Jing Tao

#2 Updated by Dave Vieglais almost 7 years ago

Tested the upload / update script using an echo server. The uploaded content is available as expected on the server side.

From Rob via slack:

(space)

original pathInfo: /formats/application/rdf xml
original requestURI: /metacat/d1/cn/v2/formats/application/rdf%20xml
new pathinfo: /formats/application/rdf%20xml
After decoded: application/rdf xml

(plus)

original pathInfo: /formats/application/rdf+xml
original requestURI: /metacat/d1/cn/v2/formats/application/rdf+xml
new pathinfo: /formats/application/rdf+xml
After decoded: application/rdf xml

So it appears the issue lies in either tomcat passing on improperly escaped content, or metacat improperly unescaping content being received.

#3 Updated by Dave Vieglais almost 7 years ago

  • Status changed from New to In Progress
  • % Done changed from 0 to 30

On further consideration, the fundamental issue is that the script:

insertOrUpdateObjectFormatList.sh

is using the "-d" switch with curl to post form data.

This mechanism expects the data to be application/x-www-form-urlencoded by the submitter. Curl itself does not urlencode data unless the --data-urlencode switch is provided.
The result is the plain XML document being sent as form data, the server is urldecoding when being processed, and the "+" characters are thus being faithfully converted to spaces.

The fix is to replace the "-d" switches in the script with "--data-urlencode" unless the data being sent has already been specifically urlencoded.

Then upload the formatId document and verify correct content is available on the server.

#4 Updated by Dave Vieglais almost 7 years ago

  • Target version changed from CCI-2.3.6 to CCI-2.3.5

After manually correcting the script on cn-dev-unm-1 and executing, the resulting format list from the CN appears correct:

curl -s "https://cn-dev-unm-1.test.dataone.org/cn/v2/formats" | grep svg
image/svg+xml

svg

#5 Updated by Dave Vieglais almost 7 years ago

IMPORTANT:

Before deploying this fix to all CNs, issue #8122 must be corrected, otherwise it is not possible to retrieve format information for any formatIds that have a "+" in them. For example, requesting @application/rdf+xml@ from cn-dev-unm-1:

curl "https://cn-dev-unm-1.test.dataone.org/cn/v2/formats/application%2Frdf%2Bxml"
<?xml version="1.0" encoding="UTF-8"?>
The format specified by application/rdf xml does not exist at this node.

but that format is present:

curl -s "https://cn-dev-unm-1.test.dataone.org/cn/v2/formats" | xml sel -t -m "//formatId[text()='application/rdf+xml']" -c ..

application/rdf+xml
Resource Description Framework
DATA

rdf

#6 Updated by Dave Vieglais almost 7 years ago

  • Blocked by Bug #8122: Metacat is double-decoding incoming urls on the CNs added

#7 Updated by Jing Tao almost 7 years ago

After we upgraded Metacat, this link works now:
https://cn-dev.test.dataone.org/cn/v2/formats/application/rdf%2Bxml

#8 Updated by Jing Tao almost 7 years ago

  • Category set to dataone-cn-metacat

#9 Updated by Dave Vieglais almost 7 years ago

  • Related to Task #8128: Correct invalid formatIDs in production added

#10 Updated by Jing Tao almost 7 years ago

  • Status changed from In Progress to Closed
  • % Done changed from 30 to 100

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)