Project

General

Profile

Task #8820

Add new DataONE Object format for HDF4/5 file formats

Added by Bryce Mecum about 3 years ago. Updated about 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
-
Target version:
-
Start date:
2019-06-13
Due date:
% Done:

100%

Milestone:
None
Product Version:
*
Story Points:
Sprint:

Description

HDF 4 and 5 are efficient binary formats for data commonly used in science: https://en.wikipedia.org/wiki/Hierarchical_Data_Format, https://www.hdfgroup.org/solutions/hdf5/. I don't think we have a lot of content in this format, if any, but it's a pretty common format and a good one at that.

I did some research on MIME types and file extensions:

Re: MIME type:

The recommended content is application/x-hdf5 for data in HDF5 or application/x-hdf for data in earlier versions.

https://www.hdfgroup.org/2018/06/citations-for-hdf-data-and-software/

Re: Extension,

Here are the details for each of the new formats:

HDF4

  • formatId: application/x-hdf
  • formatName: Hierarchical Data Format version 4 (HDF4)
  • mediaType: application/octet-stream
  • extension: h4

HDF5

  • formatId: application/x-hdf5
  • formatName: Hierarchical Data Format version 5 (HDF5)
  • mediaType: application/octet-stream
  • extension: h5

History

#1 Updated by Matthew Jones about 3 years ago

This looks great. I would suggest that the proper media types should be application/x-hdf annd application/x-hdf5 rather than application/octet-stream. Best to be specific. Thoughts?

#2 Updated by Bryce Mecum about 3 years ago

This looks great. I would suggest that the proper media types should be application/x-hdf annd application/x-hdf5 rather than application/octet-stream. Best to be specific. Thoughts?

I thought this too, but based my decision off of how other formats were done. e.g., the MATLAB formats (including 7.3 which is HDF5) and RAW are typed as octet-stream. I'm cool with either way. And now that I look, I think your suggestion is probably the better one.

Let's go with with mediaType values that match the formatId values.

#3 Updated by Jing Tao about 3 years ago

They will look like:

<objectFormat>
    <formatId>application/x-hdf</formatId>
    <formatName>Hierarchical Data Format version 4 (HDF4)</formatName>
    <formatType>DATA</formatType>
    <mediaType name="application/x-hdf"/>
    <extension>h4</extension>
</objectFormat>

<objectFormat>
    <formatId>application/x-hdf5</formatId>
    <formatName>Hierarchical Data Format version 5 (HDF5)</formatName>
    <formatType>DATA</formatType>
    <mediaType name="application/x-hdf5"/>
    <extension>h5</extension>
</objectFormat>

#4 Updated by Bryce Mecum about 3 years ago

Looks good to me.

#5 Updated by Jing Tao about 3 years ago

  • % Done changed from 0 to 100
  • Status changed from New to Closed

The formats have been added to the code (dataone-cn-metacat and d1_common_java). The formats also were added to cn-dev, cn-dev-2, cn-sandbox, cn-sandbox-2, cn-stage, cn-stage-2 and production cns.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)