Project

General

Profile

Bug #3514

CN solr search returns the 'text' field

Added by Rob Nahf over 11 years ago. Updated almost 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Skye Roseboom
Category:
d1_cn_solr_extensions
Target version:
Start date:
2013-01-23
Due date:
2013-04-13
% Done:

100%

Milestone:
CCI-1.2
Product Version:
*
Story Points:
Sprint:

Description

The 'text' field in CN solr is used to facilitate searching, but is an unstructured amalgam of science metadata, is usually very long, and probably not very useful.

for example:
https://cn-dev.test.dataone.org/cn/v1/query/solr/?q=text:*&rows=1

With opening up Solr search as a public method, it would be better to make the 'text' field non-returnable by default. (Many clients will fail to specify return fields, and so get a lot of content.)

History

#1 Updated by Dave Vieglais over 11 years ago

the config for specifying the default field list is in d1-index-solrconfig.xml:


*,score

The * returns all fields and should be replaced with the list of fields that should be returned.

I think this might be preferable over making the field noon-returnable as there may be a use case for taking advantage of the full text returned for fine tuning the indexing process.

#2 Updated by Skye Roseboom about 11 years ago

Currently it is necessary to return the text field - due to our strategy for modeling relationships in the index.

When the ORE document is indexed, each references document index record must be updated with the ORE information (resourceMap, documents, documentedBy) fields. If the full text field is not returned as part of the index record - then the original science metadata documents must be re-indexed off disk - rather than just writing the records back (with updated ORE data) - since the full text field cannot be regenerated from the index data - must be created using the original document text.

Given the above - Dave's solution looks preferable and should still allow indexing strategy to continue to work with updated query strings.

#3 Updated by Skye Roseboom about 11 years ago

  • Due date changed from 2013-01-23 to 2013-03-02
  • Target version changed from 2013.2-Block.1.1 to 2013.8-Block.1.4

#4 Updated by Skye Roseboom about 11 years ago

  • Due date changed from 2013-03-02 to 2013-03-16
  • Target version changed from 2013.8-Block.1.4 to 2013.10-Block.2.1

#5 Updated by Skye Roseboom about 11 years ago

  • Milestone changed from None to CCI-1.2

#6 Updated by Skye Roseboom about 11 years ago

  • Target version changed from 2013.10-Block.2.1 to 2013.14-Block.2.3
  • Due date changed from 2013-03-16 to 2013-04-13

#7 Updated by Skye Roseboom about 11 years ago

  • Status changed from New to In Progress

#8 Updated by Skye Roseboom about 11 years ago

  • Status changed from In Progress to Testing

Updated deployment and test resources for the CN search index so that the search handler now contains a default 'fl' (field list) parameter. It lists all fields that are returnable except the (full) 'text' field.

Installed in the cn-dev environment.

#9 Updated by Skye Roseboom almost 11 years ago

  • Status changed from Testing to Closed

Deployed to all three prod CN. text field only returned if listed in the fl parameter.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)