Project

General

Profile

Story #8061

develop queue-based processing system for the CN

Added by Rob Nahf over 4 years ago. Updated almost 4 years ago.

Status:
New
Priority:
Normal
Assignee:
Category:
Architecture Design
Target version:
Start date:
2017-04-05
Due date:
% Done:

0%

Story Points:

Description

The event-based mechanism for generating indexing tasks is not robust to network segregation and inefficient because it triggers indexing tasks when system metadata are loaded into Hazelcast map - not "real" events, just a data hydration from persistent storage.

Investigate using reliable queues instead. The design will want to be abstracted so that different implementations can be swapped in at a later date, so use standard messaging patterns.

RabbitMQ, ActiveMQ are potential implementations to use.
ZeroMQ is a lower-level implementation, probably a bit more complicated, but very performant.


Subtasks

Story #8062: Install rabbitMQ on dev CNsClosedRob Nahf

Task #8078: standardize task serialization for language independenceNewRob Nahf

Task #8079: prototype durable task processing for d1_index_processorIn ProgressRob Nahf

Task #8080: ioslate queue creation logic from processing logic from the queue definition logicIn ProgressRob Nahf

Task #8086: upgrade Spring dependenciesIn ProgressRob Nahf

Story #8081: develop federated broker configuration for indexingIn ProgressRob Nahf

Story #8082: implement SolrCloudClient to replace HttpService to allow concurrent updates of the solr index from differen machinesNewRob Nahf

Story #8084: determine the backup strategy for rabbitMQNewRob Nahf

History

#1 Updated by Rob Nahf over 4 years ago

RabbitMQ uses the terms queues, exchanges, channels, brokers, consumers, publishers.

Our processing consumers will connect to named queues via channels, and we will likely be using their high-level framework which sets up handlers in the consumers, and exception handlers in the channel (I believe).Does it make sense to abstract the channels?

#2 Updated by Dave Vieglais almost 4 years ago

  • Sprint set to Infrastructure backlog

#3 Updated by Rob Nahf almost 4 years ago

I recently came across Apache Flink, which is a stream-based messaging system with deliver-exactly-once guarantees, and could be a simpler system than RabbitMQ, depending on its robustness across the WAN. It looks like it is coupled with Kafka.

Keep as a possible alternative, although development work with RabbitMQ is mostly complete.

Also available in: Atom PDF

Add picture from clipboard (Maximum size: 14.8 MB)