Project

General

Profile

Task #3022

Story #3023: Review and where necessary alter, add, remove index fields

Update search index solr schema regarding indexed fields

Added by Skye Roseboom almost 12 years ago. Updated over 11 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Skye Roseboom
Category:
d1_indexer
Start date:
2012-06-26
Due date:
% Done:

100%

Milestone:
CCI-1.1
Product Version:
*
Story Points:
Sprint:

Description

Need to review which fields are 'indexed' and which are not in the search index's solr schema.

AuthoritativeMN was found to be not indexed which is likely wrong. Also need to review 'size' for use of index and a sortable long data type. These issues indicate the need for a wider review of the data types in the schema regarding sortable data type and indexed.

Indexed indicates solr will maintain a searchable index for the data field. It requires more disk space and index update time to maintain each indexed field.

History

#2 Updated by Skye Roseboom almost 12 years ago

non-indexed fields:

size - updating type to slong and making indexed.
checksum
checksumAlgorithm
authoritativeMN
replicationAllowed
numberReplicas
preferredReplicationMN
blockedReplicationMN
replicaMN
replicaVerifiedDate

ogcUrl
dataUrl
webUrl
(all the URL are variations of the resolve endpoint to get the data or metadata. Used by mercury search.)

#3 Updated by Dave Vieglais almost 12 years ago

  • Parent task set to #3023

#4 Updated by Skye Roseboom almost 12 years ago

  • Status changed from New to In Progress

#5 Updated by Dave Vieglais almost 12 years ago

size - updating type to slong and making indexed. Also need to verify that slong renders as expected when returning search results.

checksum: It may be helpful to search on checksum as a proxy for identifier. This is a low priority however and can remain un-indexed for now.

checksumAlgorithm: It is useful for analysis to obtain a quick estimate of the algorithms in use. However this is an edge case that can easily be implemented, albeit less efficiently through a script that scans listObjects response. Remains unindexed.

authoritativeMN: Probably not useful to an end user, but could be useful internally for auditing processes. Remains unindexed.

replicationAllowed: These replication related entries may be helpful for auditing processes in the future, but can remain unindexed for now. Remains unindexed.

numberReplicas: Remains unindexed

preferredReplicationMN: Remains unindexed.

blockedReplicationMN: Remains unindexed.

replicaMN: Remains unindexed.

replicaVerifiedDate: Remains unindexed.

These fields are not relevant in DataONE and are really a hangover from the Mercury specific implementation. Unless there's good data to populate them, they can remain unindexed:
ogcUrl
dataUrl
webUrl

#6 Updated by Skye Roseboom almost 12 years ago

Great, thanks for feedback. Will just update 'size' field for now. Will test in cn-dev environment to ensure proper display of the 'slong' data type.

#7 Updated by Skye Roseboom almost 12 years ago

Updated solr schema has been placed in the 1.0.0 buildout for release in 1.0.2 patch of the index-processor. Index-tool contains the index-processor so it may also need to be updated (pom dependency to new index jar). Will handle updating pom.xml of index-processor and index-tool prior to creating new 1.0.2 patch release tags.

#8 Updated by Skye Roseboom almost 12 years ago

  • Milestone changed from CCI-1.0.2 to CCI-1.2

Delaying this update. Trouble updating the schema in place without causing errors in search interfaces.

Going to accumulate schema changes into a single update and also investigate using a second solr core where a new index can be built and then swapped into the live index - to avoid disruptions in search.

#9 Updated by Skye Roseboom over 11 years ago

  • Milestone changed from CCI-1.2 to CCI-1.1

#10 Updated by Skye Roseboom over 11 years ago

  • Status changed from In Progress to Closed
  • translation missing: en.field_remaining_hours set to 0.0

Changed 'size' field to be indexed and updated field data type to support ranged queries on this field.

Tested on cn-dev-unm-1.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)