Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Implement retry strategy based on more message middleware #4556

Closed
1 of 7 tasks
pandaapo opened this issue Nov 15, 2023 · 23 comments
Closed
1 of 7 tasks
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@pandaapo
Copy link
Member

pandaapo commented Nov 15, 2023

Search before asking

  • I had searched in the issues and found no similar issues.

Enhancement Request

Currently, EventMesh supports two retry strategies for the case of message consumption failure: a retry strategy using HashedWheelTimer in memory (default) and a retry strategy based on external storage RocketMQ. There are still retry strategies based on other message brokers waiting to be implemented.

对于消息消费失败的情况,目前EventMesh支持在内存中使用HashedWheelTimer的重试策略(默认)和基于外部存储RocketMQ的重试策略。还有基于其他消息中间件的重试策略等待实现。

Describe the solution you'd like

task list

  • Implement a retry strategy based on Kafka
  • Implement a retry strategy based on RabbitMQ
  • Implement a retry strategy based on Pulsar
  • Implement a retry strategy based on Redis

Welcome to claim the task you are interested in and create corresponding sub issue.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@pandaapo pandaapo added the enhancement New feature or request label Nov 15, 2023
@VishalMCF
Copy link
Contributor

@pandaapo @yanrongzhen @mxsm For the task above I want to ask the approach. Will only creating a new module (and implementing the RetryStrategy class) under the events-retry module be sufficient for all use cases (Kafka, pulsar, redis, rabbitMQ)? I am asking this because I saw the code of the rocketMQRetryStrategyImpl and it does not contain any logic specifically related to RocketMQ.

@pandaapo
Copy link
Member Author

pandaapo commented Dec 14, 2023

@VishalMCF
RocketMQRetryStrategyImpl utilizes RocketMQ's retry queue feature. It sends messages to a queue with the prefix "% RETRY%", and RocketMQ will automatically retry. So it's not enough to create a new module under the eventmesh-retry module. You need to use the corresponding message middleware to implement the retry logic in the new module.

RocketMQRetryStrategyImpl 利用了RocketMQ的重试队列特性,将消息发送到前缀为“%RETRY%”的队列,RocketMQ会自动进行重试。所以并不是在eventmesh-retry模块下创建一个新模块就够了,您需要在新模块中利用相应的消息中间件实现重试逻辑。

@HarshSawarkar
Copy link
Contributor

Hey @pandaapo , I would like to work on this issue if no one is currently working on it.

@VishalMCF RocketMQRetryStrategyImpl utilizes RocketMQ's retry queue feature. It sends messages to a queue with the prefix "% RETRY%", and RocketMQ will automatically retry. So it's not enough to create a new module under the eventmesh-retry module. You need to use the corresponding message middleware to implement the retry logic in the new module.

RocketMQRetryStrategyImpl 利用了RocketMQ的重试队列特性,将消息发送到前缀为“%RETRY%”的队列,RocketMQ会自动进行重试。所以并不是在eventmesh-retry模块下创建一个新模块就够了,您需要在新模块中利用相应的消息中间件实现重试逻辑。

In this context, does the corresponding message middleware refer to CloudEvent?

@pandaapo
Copy link
Member Author

pandaapo commented Jan 26, 2024

In this context, does the corresponding message middleware refer to CloudEvent?

@HarshSawarkar
No, "message middleware" refers to messaging service, like Kafka, RocketMQ (implemented), RabbitMQ, Pulsar or Redis.

@HarshSawarkar
Copy link
Contributor

Hi @pandaapo for the pulsar message middleware there is no pluging for logs. I tried to add SLF4j logger dependencies in the build.gradle but it doesn't seem to resolve the dependency of log for eventmesh-retry-pulsar module. Ho wcan thie be achieved? I am attaching ss for your reference
image

@Pil0tXia
Copy link
Member

Pil0tXia commented Feb 13, 2024

@HarshSawarkar

The Slf4j artifact is introduced in eventmesh-common/build.gradle. You may need to depend eventmesh-common module and inject logger with @Slf4j annotation.

@HarshSawarkar
Copy link
Contributor

@HarshSawarkar

The Slf4j artifact is introduced in eventmesh-common/build.gradle. You may need to depend eventmesh-common module and inject logger with @Slf4j annotation.

I have made the changes. Can you please review the PR? Here's the link to it #4769

@Pil0tXia Pil0tXia added the help wanted Extra attention is needed label Apr 15, 2024
@jevinjiang
Copy link
Contributor

jevinjiang commented Apr 23, 2024

@pandaapo Is it feasible to create a retry topic while creating the topic, so that consumers can subscribe to it? The principle of RocketMQ should also be similar

@pandaapo
Copy link
Member Author

@pandaapo Is it feasible to create a retry topic while creating the topic, so that consumers can subscribe to it? The principle of RocketMQ should also be similar

@jevinjiang
Firstly, let me explain the purpose of this issue. It is to enable handing over the retry of messages to external messaging service, both to reduce the pressure on the EventMesh runtime service and to reuse the mature retry ability or potential ability of these services. So the scenario you propose is not in line with this issue.

Coming back to the discussion of your solution itself, it amounts to referencing RocketMQ's retry idea to implement similar functionality in EventMesh. Regardless of whether it is completely memory-based or based on external storage, there must be some complexity, such as if the retry still fails, is it necessary to add another dead letter topic? If it is based on memory, it is still adding EventMesh runtime service running pressure. If it is based on external storage, it is equivalent to wasting the ability or potential ability that external storage can provide.

首先我说明一下这个issue的目的,是要实现将对消息的重试工作交给外部消息中间件,一是可以减轻EventMesh runtime服务的运行压力,二是重用这些中间件成熟的重试能力或潜在能力。所以您提的方案是不符合该issue的。

再来讨论下您的方案本身,相当于参考RocketMQ的重试思路,在EventMesh中实现类似的功能。不管是完全基于内存的,还是基于外部Storage,肯定有一定的复杂性,比如如果重试还是失败,是不是还要再加个死信topic?如果是基于内存,相当于还是增加了EventMesh runtime服务的运行压力;如果是基于外部Storage,相当于浪费了外部Storage本身能提供的能力或潜在能力。

@jevinjiang
Copy link
Contributor

jevinjiang commented Apr 23, 2024

@pandaapo
For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies.
那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

@jevinjiang
Copy link
Contributor

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

Of course, I am just providing a general idea. If the idea is approved, I will add some details (such as the maximum number of retries, callback after failure)

@jevinjiang
Copy link
Contributor

@pandaapo Is it feasible to create a retry topic while creating the topic, so that consumers can subscribe to it? The principle of RocketMQ should also be similar

In fact, my original intention was for a message queue without a retry mechanism.

@pandaapo
Copy link
Member Author

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

What do you mean by "message queues with or without retry strategies"? Does it mean that some external message middleware does not have ready-made retry features, or is it something else?

您说的“有或没有提供重试策略的消息队列”是指什么?是指某种外部消息中间件有没有现成的重试特性,还是其他什么意思?

@jevinjiang
Copy link
Contributor

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

What do you mean by "message queues with or without retry strategies"? Does it mean that some external message middleware does not have ready-made retry features, or is it something else?

您说的“有或没有提供重试策略的消息队列”是指什么?是指某种外部消息中间件有没有现成的重试特性,还是其他什么意思?

yeah, such as kafka、redis( redisson topic), no middleware provides consumption retries.

@pandaapo
Copy link
Member Author

pandaapo commented Apr 23, 2024

Let's clarify some things: the retries that this issue talks about are internal to EventMesh, and are not a retry feature available to the user; EventMesh automatically retries messages that fail internally. Users can only set it to retry in memory (default, implemented, increase the pressure on EventMesh), retry in RocketMQ (implemented), and retry in other external messaging middleware (the purpose of this issue).

Do you have any misunderstandings about these? If there is no misunderstanding, in the case of "some external messaging middleware doesn't have an existing retry feature", that's what the issue wants developers to do, to use this external messaging middleware, and to develop a feature for retrying in this middleware (reduce the burden on EventMesh).

我们先明确一些事情:该issue谈论的重试是EventMesh内部的机制,不是向用户提供的重试特性。EventMesh对于失败的消息,内部会自动进行的重试。用户只能设置:在内存中重试(默认的、已实现、会增加EventMesh运行压力)、在RocketMQ中重试(已实现)、在其他外部消息中间件中重试(该issue的目的)。
请问对这些您有没有误解?如果没有误解,对于“某种外部消息中间件没有现成的重试特性”的情况,也是该issue希望开发者能完成的事啊,利用这种外部消息中间件,开发在该中间件中进行重试的功能(减轻EventMesh负担)。

@jevinjiang
Copy link
Contributor

@pandaapo For message queues that do not provide a retry strategy, re-add the original topic after the event execution fails, causing the event to be reprocessed. For message queues with retry strategies, use the corresponding retry strategies. 那对于没有提供重试策略的消息队列,event执行失败后重新add原来topic,使得event重新处理. 对于有重试策略的消息队列,则使用相应的重试策略.

emmmm.... This idea of ​​mine is to solve this problem. ? ? ?

@jevinjiang
Copy link
Contributor

Sorry, I don't understand. I've been telling you my ideas for solving the problem.

@pandaapo
Copy link
Member Author

pandaapo commented Apr 23, 2024

You don't need to say sorry, Perhaps I didn't understand your meaning. In my opinion, your plan still requires the heavy participation of EventMesh runtime to support the retry process.

您没有必要道歉,也许是我没有理解您的意思。在我看来,您的方案依然需要EventMesh runtime重度参与和支撑重试过程。

@jevinjiang
Copy link
Contributor

jevinjiang commented Apr 23, 2024

My original intention is that for the two retry strategies of kafka and redis, because these two middlewares do not provide consumption retry strategies, my plan is as follows:

The first kind: Simulate rocketmq's retry mechanism, create a retry topic when creating a topic, use the same consumer subscription, and keep retrying until the maximum number of repetitions is reached.

The second kind: In these two retry strategies, The failed event will re-call producer.publish again and put it into the original topic.

@pandaapo
Copy link
Member Author

My original intention is that for the two retry strategies of kafka and redis, because these two middlewares do not provide consumption retry strategies, my plan is as follows:

The first kind: Simulate rocketmq's retry mechanism, create a retry topic when creating a topic, use the same consumer subscription, and keep retrying until the maximum number of repetitions is reached.

The second kind: In these two retry strategies, the failed event is directly added to the end of the original topic.

The fact is that I didn't understand your meaning before. I think the first plan is more feasible, while the second plan, in my opinion, will cause message confusion in the message queue and duplicate consumption.
For "create a retry topic when creating a topic", it is better to delay the creation until the retry is needed.

@jevinjiang
Copy link
Contributor

My original intention is that for the two retry strategies of kafka and redis, because these two middlewares do not provide consumption retry strategies, my plan is as follows:

The first kind: Simulate rocketmq's retry mechanism, create a retry topic when creating a topic, use the same consumer subscription, and keep retrying until the maximum number of repetitions is reached.

The second kind: In these two retry strategies, the failed event is directly added to the end of the original topic.

The fact is that I didn't understand your meaning before. I think the first plan is more feasible, while the second plan, in my opinion, will cause message confusion in the message queue and duplicate consumption.

For "create a retry topic when creating a topic", it is better to delay the creation until the retry is needed.

Thanks for your suggestion, I will implement it soon.

@pandaapo
Copy link
Member Author

Thanks for your suggestion, I will implement it soon.

Thanks for your future contribution. Please create corresponding new sub issue(s) then.

HarshSawarkar added a commit to HarshSawarkar/eventmesh that referenced this issue May 18, 2024
HarshSawarkar added a commit to HarshSawarkar/eventmesh that referenced this issue May 19, 2024
…tegy-based-on-pulsar-message-middleware' into apache#4556-Implement-retry-strategy-based-on-pulsar-message-middleware
@pandaapo
Copy link
Member Author

I found that the original design of the retry module seems to have some issues, but I'm not sure yet. I need to close this issue first.

@pandaapo pandaapo closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants