Story #8485: XMLSchemaService progressively builds massively long string by calling doRefresh - Infrastructure - DataONE Tasks

Story #8485

XMLSchemaService progressively builds massively long string by calling doRefresh

Added by Rob Nahf almost 7 years ago. Updated almost 7 years ago.

Status:

Closed

Priority:

Urgent

Assignee:

Jing Tao

Category:

Metacat

Target version:

CCI-2.3.10

Start date:

2018-03-06

Due date:

% Done:

100%

Story Points:

Sprint:

Infrastructure backlog

Description

in the static createRegisteredNameSpaceAndLocationString(), line 434 - 438, the static formatId_NamespaceLocationHash gets progressively added onto through multiple refreshes.

The value is then retrieved by getNameSpaceAndLocation(format_id) by MetacatHandler.handleInsertOrUpdateAction (on line 1778)
and passed into DocumentWrapper.write as an argument, and ultimately set as a property of the XMLReader via parser.setProperty(EXTERNALSCHEMALOCATIONPROPERTY, schemaLocation); after going through a string.trim function.

While processing the initial Pangaea corpus, this string grew to 100Mb length. It probably contributed to the increasing amount of time it took to process Pangaea (and only Pangaea) metadata.

What is the reason behind appending new values to old, rather than replacing them?

Here is the problem code

                  //the hash table already has it. We will attache the new pair to the exist value
                    String value = formatId_NamespaceLocationHash.get(formatId);
                    value += " "+ xmlSchema.getFileNamespace() + " "
                            + xmlSchema.getLocalFileUri();
                    formatId_NamespaceLocationHash.put(formatId, value);

Related issues

History

#1 Updated by Jing Tao almost 7 years ago

% Done changed from 0 to 100
Target version set to CCI-2.3.9
Status changed from New to Closed

In the beginning of the call, we initialize the formatId_NamespaceLocationHash and the bug has been fixed. A new junit test method has been added as well.