Story #7170: Evaluate the feasibility of extracting provenance information from the journal.txt document - OGC-Slender Node - DataONE Tasks

Story #7170

Evaluate the feasibility of extracting provenance information from the journal.txt document

Added by Dave Vieglais almost 10 years ago. Updated almost 10 years ago.

Status:

New

Priority:

Normal

Assignee:

Dave Vieglais

Target version:

Start date:

Due date:

% Done:

Story Points:

Sprint:

History

#1 Updated by Dave Vieglais almost 10 years ago

Assignee set to Dave Vieglais

Each entry on the FTP site contains a "journal.txt" document that basically describes the provenance of the package.

This is unfortunately in a very much human-ingestible form.

The goal of this activity is to determine what, if any, useful provenance information can be divined from the journal and associated information (e.g. manifest files).

e.g. journal file:

ftp://ftp.nodc.noaa.gov/nodc/archive/arc0064/0118783/14.14/about/journal.txt

#2 Updated by Dave Vieglais almost 10 years ago

Likely to be a laborious free text processing exercise. From John Relph:

No, the journal.txt is not available in any machine-readable format. We
have talked about implementing a system to enable that, but currently the
file is a mostly free-form file for the use of the human data content
manager to record actions performed and other information.

We will, at some point soon, start populating lineage sections in our ISO
metadata with information about the various revisions of the accessions.

Also available in: Atom PDF

Project

General

Profile

Infrastructure » Python GMN » OGC-Slender Node

Issues

Custom queries

Story #7170

Evaluate the feasibility of extracting provenance information from the journal.txt document

History

#1 Updated by Dave Vieglais almost 10 years ago

#2 Updated by Dave Vieglais almost 10 years ago