Project

General

Profile

Story #8702

Indexing Refactor Strategy

Added by Rob Nahf about 3 years ago. Updated about 3 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
d1_indexer
Target version:
Start date:
2018-09-24
Due date:
% Done:

0%

Story Points:

Description

Indexing is non-performing and has some inconsistency problems.

A solution was developed that addresses the main issues, and involves the creation of a separate solr core for relationships (the resource maps). Initially, the solution will create the separate core as a behind the scenes reference for the main search index. Relationships (resource_map, documents, isDocumentedBy) will still be copied into the main search record.

Additionally, archived objects will not be removed from the index, but the field archived will be added to the schema.

The new logic for processing resource maps and archiving objects should remove many of the inefficient checks that cause records to be reindexed.

The main phases for development will be:

  1. refactor out the custom solr client for use of the standard org.apache.solrj-client.
  2. migrate the schema to include archived field & introduce relationships core. Refactor the resourcemap subprocessor to use it, and trigger relationship tasks.
  3. refactor the delete subprocessor (for archived records) & add the search handler.

Subtasks

Task #8703: test the cleaned up indexer in DEVNewRob Nahf


Related issues

Related to Infrastructure - Bug #8696: double indexing of a resource map and another not processed because of resource contention (lock) on member New 2018-09-12
Related to Infrastructure - Bug #3675: package relationships not available for archived objects New 2013-03-20
Related to Infrastructure - Bug #8542: resource Map not indexed when the resourceMap SID is used in triples. Closed 2018-04-14
Related to Infrastructure - Bug #8536: resource Map update when metadata SID is used is not indexed In Progress 2018-04-09
Related to Infrastructure - Story #8537: indexer doesn't populate SID-defined relationships unless new resourceMap is submitted New 2018-04-09
Related to Infrastructure - Story #8363: indexer shutdown generates index tasks New 2018-02-12
Related to Infrastructure - Story #8172: investigate atomic updates for some solr updates In Progress 2017-09-01
Related to Infrastructure - Story #8082: implement SolrCloudClient to replace HttpService to allow concurrent updates of the solr index from differen machines New 2017-04-25
Related to Infrastructure - Bug #8182: relationships defined using SIDs in resource maps don't stay current New 2017-09-11
Related to Infrastructure - Bug #8093: resource map index processing is inefficient (2sec / referenced object) New 2017-05-10

History

#1 Updated by Rob Nahf about 3 years ago

  • Related to Bug #8696: double indexing of a resource map and another not processed because of resource contention (lock) on member added

#2 Updated by Rob Nahf about 3 years ago

  • Related to Bug #3675: package relationships not available for archived objects added

#3 Updated by Rob Nahf about 3 years ago

  • Related to Bug #8542: resource Map not indexed when the resourceMap SID is used in triples. added

#4 Updated by Rob Nahf about 3 years ago

  • Related to Bug #8536: resource Map update when metadata SID is used is not indexed added

#5 Updated by Rob Nahf about 3 years ago

  • Related to Story #8537: indexer doesn't populate SID-defined relationships unless new resourceMap is submitted added

#6 Updated by Rob Nahf about 3 years ago

  • Related to Story #8363: indexer shutdown generates index tasks added

#7 Updated by Rob Nahf about 3 years ago

  • Related to Story #8172: investigate atomic updates for some solr updates added

#8 Updated by Rob Nahf about 3 years ago

  • Related to Story #8082: implement SolrCloudClient to replace HttpService to allow concurrent updates of the solr index from differen machines added

#9 Updated by Rob Nahf about 3 years ago

  • Related to Bug #8182: relationships defined using SIDs in resource maps don't stay current added

#10 Updated by Rob Nahf about 3 years ago

  • Related to Bug #8093: resource map index processing is inefficient (2sec / referenced object) added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)