Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1700][FOLLOWUP] Support ShuffleFallbackCount metric for fallback to vanilla Flink built-in shuffle implementation #3012

Closed
wants to merge 2 commits into from

Conversation

SteNicholas
Copy link
Member

@SteNicholas SteNicholas commented Dec 19, 2024

What changes were proposed in this pull request?

Support ShuffleFallbackCount metric for fallback to vanilla Flink built-in shuffle implementation.

Why are the changes needed?

#2932 has already supported fallback to vanilla Flink built-in shuffle implementation, which is lack of ShuffleFallbackCount metric to feedback the situation of fallback.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

RemoteShuffleMasterSuiteJ#testRegisterPartitionWithProducerForForceFallbackPolicy

@SteNicholas SteNicholas marked this pull request as draft December 19, 2024 08:59
@SteNicholas SteNicholas force-pushed the CELEBORN-1700 branch 2 times, most recently from 64a123d to b1baa54 Compare December 19, 2024 09:49
…lback to vanilla Flink built-in shuffle implementation
@SteNicholas SteNicholas marked this pull request as ready for review December 19, 2024 10:10
@SteNicholas SteNicholas requested a review from reswqa December 19, 2024 10:11
@SteNicholas
Copy link
Member Author

Ping @reswqa, @codenohup.

@codenohup
Copy link
Contributor

Hi, @SteNicholas

This PR LGTM, but I have a minor question. It seems that the decision to fall back to the built-in shuffle is based on job granularity rather than Shuffle granularity. I think it would be beneficial to have a fallback count metric that reflects job granularity.

@SteNicholas
Copy link
Member Author

SteNicholas commented Dec 19, 2024

@codenohup, The difference of implementation that Flink only supports job granular fallback but Spark supports shuffle granular fallback causes above fallback metric situation. I would like to support job granular fallback metrics in another pull request.

Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall except a nit.

Copy link
Member

@reswqa reswqa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

…anilla Flink built-in shuffle implementation
@SteNicholas SteNicholas requested a review from FMX December 20, 2024 03:40
Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks. Merged into main(v0.6.0).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants