Decision #8765: Consider changing how BaseSolrFieldXPathTest works - Infrastructure - DataONE Tasks

Decision #8765

Consider changing how BaseSolrFieldXPathTest works

Added by Bryce Mecum about 6 years ago. Updated about 6 years ago.

Status:

Closed

Priority:

Low

Assignee:

Bryce Mecum

Category:

Target version:

Start date:

2019-02-13

Due date:

% Done:

100%

Milestone:

None

Sprint:

Description

Ran into a weird thing while expanding the https://repository.dataone.org/software/cicore/trunk/cn/d1_cn_index_processor/src/test/java/org/dataone/cn/index/SolrFieldXPathEmlTest.java to test an EML 2.2.0 doc.

SolrFieldXPathEmlTest compares values extracted via various subprocessors to a set of expectations stored in a HashMap<String, String>() (of form <fieldName, expectedValue>). This data structured limits expectations to one value per field. Forever ago, in r7985,

r7985 | sroseboo | 2012-03-23 12:12:53 -0800 (Fri, 23 Mar 2012) | 1 line

initial commit of search index support for parsing FGDC science metadata docs.

Index: src/test/java/org/dataone/cn/index/BaseSolrFieldXPathTest.java

support was added for testing multiple expectations for a single field by defining a convention of smushing multiple values into a single string, separated by two # characters (##). For example,

eml210Expected.put("project", "Random Project Title##Another Random Project Title");

would test the project field for two values, "Random Project Title" and "Another Project Title" not a literal "Random Project Title##Another Random Project Title". I imagine this was picked because it's rare to see a ## in a metadata record which seems reasonable.

I ran afoul of this today because I wanted to test an expectation for a field with a # in it and I couldn't because it was being split when I didn't want it to be. Why did a single # break things when the convention above is a double #? Because BaseSolrFieldXPathTest.java splits the expectation string using StringUtils.split like this:

StringUtils.split(expectedForField, "##") // Where expectedForField might equal "Random Project Title##Another Random Project Title"

According to https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#split-java.lang.String-java.lang.String-, the second arg to StringUtils.split, separatorChars, is "the characters used as the delimiters, null splits on whitespace" which tells me we're using it wrong. I think the method we needed to use was StringUtils.splitByWholeSeparator which correctly splits only on double #.

I could just change the line of code and move on but that breaks a ton (98) of tests. Before I did that, I wanted to ask what others thought. I see a few routes:

Change the code to correctly split only on ## and not # and update all the tests I break
Do 1 and use a different separator to something more future proof. I suggest "&&" because that'd be invalid in XML outside a CDATA section.
Change how the expectations get tested so field expectations can have a one to many relationship. This'd take some time so I'd opt for (1) or (2) instead

Any preferences out there?

History

#1 Updated by Bryce Mecum about 6 years ago

Brought this up on Slack and got feedback from Jing, Rob, and Dave. Majority was in favor of (2). I'll start on that today.

#2 Updated by Bryce Mecum about 6 years ago

% Done changed from 0 to 30
Priority changed from Normal to Low
Assignee set to Bryce Mecum
Status changed from New to In Progress

#3 Updated by Bryce Mecum about 6 years ago

% Done changed from 30 to 100
Status changed from In Progress to Closed

I started swapping out the separators from #/## to && and ran into an edge case where some assertions needed to test multiple URLs with trialing & symbols (weird), like "http://example.com/ASDF&" which would make the assertion "http://example.com/ASDF&&&http://example.com/ASDF" which doesn't split properly. I decided to try <> instead, first confirming no test assertions include that string and things look all good now.

I think the switch is probably pretty agreeable to people but let me know if it isn't.

Also available in: Atom PDF

Project

General

Profile

Infrastructure

Issues