Project

General

Profile

Story #8838

ARM harvest

Added by John Evans about 2 years ago. Updated almost 2 years ago.

Status:
New
Priority:
Normal
Assignee:
Target version:
-
Start date:
2019-08-26
Due date:
% Done:

0%

Story Points:

Description

Since the ARM folks successfully used the scanner to get a clean run of their sitemap and metadata, it was time to try to harvest to a test GMN instance.

Unfortunately there looks like there is an issue of duplicate DOIs. This is not allowed, is it? If it is, I'm doing this wrong...

The following DOIs are repeated within the set of 320 documents:

10.5439/1025028
10.5439/1025029
10.5439/1025030
10.5439/1025039
10.5439/1025145
10.5439/1025146
10.5439/1025151
10.5439/1025152
10.5439/1025157
10.5439/1025199
10.5439/1025211
10.5439/1025220
10.5439/1025274
10.5439/1025306
10.5439/1025309
10.5439/1025310
10.5439/1025322
10.5439/1027266
10.5439/1027295
10.5439/1027366
10.5439/1027370
10.5439/1027760
10.5439/1027765
10.5439/1046207
10.5439/1046211
10.5439/1095601
10.5439/1150253
10.5439/1150306
10.5439/1258791
10.5439/1350629
10.5439/1025165
10.5439/1025192
10.5439/1025224
10.5439/1025258
10.5439/1025280
10.5439/1025281
10.5439/1025323
10.5439/1021460
10.5439/1023898
10.5439/1025191
10.5439/1025263
10.5439/1095594
10.5439/1285691
10.5439/1227214

For example, 10.5439/1025028 is associated with https://www.archive.arm.gov/metadata/adc/html/nsawsipatchsummaryC2.b1.html and https://www.archive.arm.gov/metadata/adc/html/nsawsipatchsummaryC1.a1.html (there are more). The metadata associated is very similar, but not quite the same.

History

#1 Updated by John Evans almost 2 years ago

ARM has massively reworked their documents, so this issue is now moot.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)