Story #8702
Indexing Refactor Strategy
0%
Description
Indexing is non-performing and has some inconsistency problems.
A solution was developed that addresses the main issues, and involves the creation of a separate solr core for relationships (the resource maps). Initially, the solution will create the separate core as a behind the scenes reference for the main search index. Relationships (resource_map, documents, isDocumentedBy) will still be copied into the main search record.
Additionally, archived objects will not be removed from the index, but the field archived will be added to the schema.
The new logic for processing resource maps and archiving objects should remove many of the inefficient checks that cause records to be reindexed.
The main phases for development will be:
- refactor out the custom solr client for use of the standard org.apache.solrj-client.
- migrate the schema to include archived field & introduce relationships core. Refactor the resourcemap subprocessor to use it, and trigger relationship tasks.
- refactor the delete subprocessor (for archived records) & add the search handler.
Subtasks
Related issues
History
#1 Updated by Rob Nahf over 6 years ago
- Related to Bug #8696: double indexing of a resource map and another not processed because of resource contention (lock) on member added
#2 Updated by Rob Nahf over 6 years ago
- Related to Bug #3675: package relationships not available for archived objects added
#3 Updated by Rob Nahf over 6 years ago
- Related to Bug #8542: resource Map not indexed when the resourceMap SID is used in triples. added
#4 Updated by Rob Nahf over 6 years ago
- Related to Bug #8536: resource Map update when metadata SID is used is not indexed added
#5 Updated by Rob Nahf over 6 years ago
- Related to Story #8537: indexer doesn't populate SID-defined relationships unless new resourceMap is submitted added
#6 Updated by Rob Nahf over 6 years ago
- Related to Story #8363: indexer shutdown generates index tasks added
#7 Updated by Rob Nahf over 6 years ago
- Related to Story #8172: investigate atomic updates for some solr updates added
#8 Updated by Rob Nahf over 6 years ago
- Related to Story #8082: implement SolrCloudClient to replace HttpService to allow concurrent updates of the solr index from differen machines added
#9 Updated by Rob Nahf over 6 years ago
- Related to Bug #8182: relationships defined using SIDs in resource maps don't stay current added
#10 Updated by Rob Nahf over 6 years ago
- Related to Bug #8093: resource map index processing is inefficient (2sec / referenced object) added