Project

General

Profile

Member Node Deployment Work Plan Template Description

The various questions listed are intended as guides to topics that needs to be addressed when bringing on a Member Node and although emphasize issues more related to Slender Node deployments, are still generally applicable. Archive completed Plans in Google Drive


Work Plan: {Member Node name}

1. Resources

  • Redmine issue:
  • Implementation: {e.g. Metacat, GMN, GMN slender node, ...}
  • MN Description Document: {Link to MN description document}
  • MN ID: the node identifier
  • Contacts: {names of people to contact for MN setup and administration}

2. Background

Brief description of the MN, emphasis on technical aspects.

2.1 Existing Repository APIs

List the existing APIs provided by the repository that provide access to the repository datasets, e.g. OGC WCS, OAI-PMH, ...

2.2 Authentication and/or Identification

Are the repository resources publicly accessible? If not, then how is access controlled? Will DataONE authentication and access controls be supported?

3. Development Plan and Specifications

Broadly speaking, there are two aspects that need to be resolved: 1) How does the repository represent a dataset and its components? and 2) How are datasets and their components accessed?

3.1 Dataset Structure

Within DataONE, datasets are composite structures comprised of separate data and metadata components with a third component, an OAI-ORE document that describes the relationships between the components of a dataset. See https://purl.dataone.org/architecture/design/DataPackage.html

How are the repository datasets structured?

Are resource maps available or do they need to be generated?

Are data and metadata separate components and individually identifiable?

Is each component of a dataset immutable?

When datasets or their components are updated, are old versions retained?

3.2 Identifiers

All content synchronized with DataONE is immutable (checksum of the object bytes never changes), and each object is identified with a persistent identifier (PID) that must be unique within the DataONE federation, and ideally globally unique. See http://purl.dataone.org/architecture/design/PIDs.html

Since version 2.0 of the infrastructure, DataONE also supports series identifiers (SIDs) which will always resolve to the latest revision of an object. See See http://purl.dataone.org/architecture/design/ContentImmutability.html

Within DataONE, SIDs or PIDs are treated as opaque strings and are resolved using the resolve service of the Coordinating Nodes. In practice, a repository may use different forms of identifier for different purposes. For example, DOIs may be used to identify the dataset and handles or UUIDs used to identify specific data components.

What form of identifier is being used for the different components of a dataset?

3.2.1 Persistent Identifiers (PIDs)

Does each component of a dataset have a persistent identifier (PID) that always refers to the exact same item (identical bytes)?

3.2.2 Series Identifiers (SIDs)

Does the repository support the notion of a series identifier (SID), that is, an identifier that refers to the current revision of the dataset (or its components)?

3.2.3 Format Identifiers

Different types of object are assigned unique "format identifiers" in DataONE. The formatId assists the infrastructure and consumers in appropriately managing and using the content. The list of formatIds is available from https://cn.dataone.org/cn/v2/formats.

What formats are used for the science metadata documents?

What types of science data objects are exposed?

3.3 Listing Datasets and Components

Describe how a list of available datasets and their components can be retrieved.

Is the listing per dataset or per dataset component?

If the listing is per dataset, how is a list of components of the dataset obtained?

3.4 Change Detection

How are changes to the list of datasets and/or dataset components discovered?

How frequently are changes advertised?

3.5 Get Dataset Component

Describe how each component of a dataset can be retrieved. For example, given the identifier to a component, is there a service that provides access to the bytes of the component?

3.6 System Metadata Generation

Describe how system metadata will be generated, specifically how formatIds for the dataset components will be determined, what the replication policy will be, who owns the content.

4. Deployment Plan

Which environment will the deployment be tested in?

Will the test implementation be repurposed for production use?

Who will be the points of contact for DataONE and the Member Node?

Are there any particular deadlines to be aware of?

Add picture from clipboard (Maximum size: 14.8 MB)