Eventscripts parallel processing

4/17/2023

There are many other use cases that can benefit from the Parallel Consumer. The resulting application is up to 2,000 times faster than the initial implementation (assuming the HTTP service can keep up!). Both of these ensure that processing in key-order is maintained.

We can also pipeline the calls to the legacy HTTP service. We can increase the parallelism without increasing the partition count. The Parallel Consumer addresses two issues with this implementation. With many other applications consuming this data and expecting guaranteed ordering, changing the partition count isn’t the easiest option. What is more, new consumers cannot easily be added to the consumer group as the topic is created with only five partitions. Originally, the system had five consumers in a consumer group working in parallel, but this setup only processes around 1,000 messages per second because each request to the HTTP service takes on average 50 ms to return. The application has to validate the coordinates of each train by calling a legacy HTTP service. Around 10,000 trains are active at the same time, all sending their GPS location into a central Kafka cluster. Usage example: Massively parallel web service requestsĬonsider a real-world application that validates the position of trains on a rail network. In looking at each of these features, we’ll discuss use cases that the library applies to and dive into the major technical themes of its implementation.

These are great for implementing low-latency task queues, a problem that isn’t well addressed by Apache Kafka today. For instance, if you need to look up customer details from a database or while you are processing messages, you can make these requests in parallel via non-blocking I/O.įinally, the Parallel Consumer provides features for client-side work queues, including message-level acknowledgment and key-based processing. Second, the Parallel Consumer makes it easy for you to call out to other services efficiently without stalling your application. By switching from partition-level parallelism to key-level parallelism, you don’t have to over-provision topic partitions or change the ones you have just so you can scale your consumer group out. It does this using a thread pool, with the library handling all the tricky bookkeeping required in Kafka. The Parallel Consumer also lets you define parallelism in terms of key-level ordering guarantees, rather than the coarser-grained, partition-level parallelism that comes with the Kafka consumer groups. In essence, the Parallel Consumer is a JVM-based, Apache 2.0 client library that includes everything you’d expect in regular Kafka consumers: consumer groups, transactions/exactly-once semantics, etc., but also three new features in addition to these.įirst, the Parallel Consumer makes it easy to process messages with a higher level of parallelism than the number of partitions for the input data. These are just a few of the reasons why we wrote the Confluent Parallel Consumer, which provides an alternate approach to parallelism that subdivides the unit of work from a partition down to a key or even a message. For example, when partition counts are fixed for a reason beyond your control, you need to call other databases or microservices-which can take a while to respond-or use queue-like semantics, where slow-to-process messages don’t hold up faster ones further back in the queue. Consuming messages in parallel is what Apache Kafka ® is all about, so you may well wonder, why would we want anything else? It turns out that, in practice, there are a number of situations where Kafka’s partition-level parallelism gets in the way of optimal design.

0 Comments

Eventscripts parallel processing

Leave a Reply.

Author

Archives

Categories