Story #8061: develop queue-based processing system for the CN - Infrastructure - DataONE Tasks

Story #8061

develop queue-based processing system for the CN

Added by Rob Nahf over 7 years ago. Updated almost 7 years ago.

Status:

New

Priority:

Normal

Assignee:

Rob Nahf

Category:

Architecture Design

Target version:

CCI-2.4.0

Start date:

2017-04-05

Due date:

% Done:

Story Points:

Sprint:

Infrastructure backlog

Description

The event-based mechanism for generating indexing tasks is not robust to network segregation and inefficient because it triggers indexing tasks when system metadata are loaded into Hazelcast map - not "real" events, just a data hydration from persistent storage.

Investigate using reliable queues instead. The design will want to be abstracted so that different implementations can be swapped in at a later date, so use standard messaging patterns.

RabbitMQ, ActiveMQ are potential implementations to use.
ZeroMQ is a lower-level implementation, probably a bit more complicated, but very performant.

Subtasks

History

#1 Updated by Rob Nahf over 7 years ago

RabbitMQ uses the terms queues, exchanges, channels, brokers, consumers, publishers.

Our processing consumers will connect to named queues via channels, and we will likely be using their high-level framework which sets up handlers in the consumers, and exception handlers in the channel (I believe).Does it make sense to abstract the channels?

#2 Updated by Dave Vieglais almost 7 years ago

Sprint set to Infrastructure backlog

#3 Updated by Rob Nahf almost 7 years ago

I recently came across Apache Flink, which is a stream-based messaging system with deliver-exactly-once guarantees, and could be a simpler system than RabbitMQ, depending on its robustness across the WAN. It looks like it is coupled with Kafka.

Keep as a possible alternative, although development work with RabbitMQ is mostly complete.

Also available in: Atom PDF

Project

General

Profile

Infrastructure

Issues