Project

General

Profile

Task #7771

Story #7769: Improve the performance on solr index

Use multiple threads to index objects

Added by Jing Tao about 4 years ago. Updated over 3 years ago.

Status:
Closed
Priority:
Normal
Assignee:
Category:
d1_cn_index_processor
Start date:
2016-05-04
Due date:
% Done:

100%

Story Points:
Sprint:

Description

Worked on generating index on multiple threads:

while (getNextTask() != null ) {

process(nextTask); }

while (getNextTask() != null ) {

exceutor.submit(nextTask);

}

Fortunately there is no shared class variables can’t be static. So we don’t need to lock them.

Handling resource maps has a race issue:

R1: s1 documents d1

R2: s1 documents d2

At the beginning, there is no documents and resourceMap on the solr index of s1.

Sequence:

After processing R1 and the solr index of s1:

documents d1

resourceMap R1

After processing R2 and the solr index of s1:

documents d1

documents d2

resourceMap R1

resourceMap R2

Concurrent:

1. Both threads to handle R1 and R2 read a copy without documents and resourceMap information.

2. Thread 1 handling R1 finished first and send it to the solr server:

documents d1

resourceMap R1

3. Thread 2 handling R2 finished later and send it to the solr server. It will overwrite what thread 1 did. So the eventual result will be:

documents d2

resourceMap R2

Wrong!

Handle resource map objects sequentially? no.

Proposed Solution:

1. Maintain a set containing the relevant objects’ id (s1 and d1) when it processes a resource map

2. Before we process a resource map, check its relevant ids are on the set. If they are on the set, please wait and try again later (with max attempts); otherwise, put those ids on the set and start to process it.

3. The processing is done, remove those ids from the set

ConcurrentSkipListSet vs HashSet + lock vs Hash+ synchronize


Related issues

Related to Infrastructure - Story #8172: investigate atomic updates for some solr updates In Progress 2017-09-01
Related to Infrastructure - Story #8173: add checks for retrograde systemMetadata changes New 2017-09-01

History

#1 Updated by Jing Tao about 4 years ago

  • Status changed from New to In Progress
  • Category set to d1_cn_index_processor
  • % Done changed from 0 to 30

#2 Updated by Dave Vieglais over 3 years ago

  • % Done changed from 30 to 100
  • Status changed from In Progress to Closed

#3 Updated by Rob Nahf almost 3 years ago

  • Related to Story #8172: investigate atomic updates for some solr updates added

#4 Updated by Rob Nahf almost 3 years ago

  • Related to Story #8173: add checks for retrograde systemMetadata changes added

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)