XMLSchemaService progressively builds massively long string by calling doRefresh
in the static createRegisteredNameSpaceAndLocationString(), line 434 - 438, the static formatId_NamespaceLocationHash gets progressively added onto through multiple refreshes.
The value is then retrieved by getNameSpaceAndLocation(format_id) by MetacatHandler.handleInsertOrUpdateAction (on line 1778)
and passed into DocumentWrapper.write as an argument, and ultimately set as a property of the XMLReader via
parser.setProperty(EXTERNALSCHEMALOCATIONPROPERTY, schemaLocation); after going through a string.trim function.
While processing the initial Pangaea corpus, this string grew to 100Mb length. It probably contributed to the increasing amount of time it took to process Pangaea (and only Pangaea) metadata.
What is the reason behind appending new values to old, rather than replacing them?
Here is the problem code
//the hash table already has it. We will attache the new pair to the exist value String value = formatId_NamespaceLocationHash.get(formatId); value += " "+ xmlSchema.getFileNamespace() + " " + xmlSchema.getLocalFileUri(); formatId_NamespaceLocationHash.put(formatId, value);