Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CELEBORN-1573] Change to debug logging on client side for reserve slots #2700

Closed
wants to merge 1 commit into from

Conversation

akpatnam25
Copy link
Contributor

What changes were proposed in this pull request?

Change client side reserve slots log to debug.

Why are the changes needed?

in a large fleet of workers such as ours, it is noisy for the user in the driver logs.

Does this PR introduce any user-facing change?

No

How was this patch tested?

N/A

@akpatnam25 akpatnam25 marked this pull request as ready for review August 21, 2024 20:35
@akpatnam25
Copy link
Contributor Author

Copy link

@zhouyejoe zhouyejoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@s0nskar
Copy link
Contributor

s0nskar commented Aug 22, 2024

I think this change can make retrospective debugging hard as we will have no insight into which workers the shuffle data was written to and what was the behaviour of those workers during that time.

Instead of making it completely a debug log IMO we should at least output some basic info about the worker host selected for shuffle this will reduce the noise but also leave some space for debugging issues.

Ex –

Host: 10.152.63.41
WorkerStatus: WorkerStatus{state=Normal, stateStartTime=1724244049248}
NumOfSlotsAllocated: 100

@akpatnam25
Copy link
Contributor Author

I think this change can make retrospective debugging hard as we will have no insight into which workers the shuffle data was written to and what was the behaviour of those workers during that time.

Instead of making it completely a debug log IMO we should at least output some basic info about the worker host selected for shuffle this will reduce the noise but also leave some space for debugging issues.

Ex –

Host: 10.152.63.41
WorkerStatus: WorkerStatus{state=Normal, stateStartTime=1724244049248}
NumOfSlotsAllocated: 100

I think this will still be noisy in a large fleet, especially in the driver logs where users might be looking for other logs/errors. I feel that if truly needed for debugging, the user can then enable debug logging for it.

@FMX
Copy link
Contributor

FMX commented Aug 22, 2024

I think this change can make retrospective debugging hard as we will have no insight into which workers the shuffle data was written to and what was the behaviour of those workers during that time.
Instead of making it completely a debug log IMO we should at least output some basic info about the worker host selected for shuffle this will reduce the noise but also leave some space for debugging issues.
Ex –

Host: 10.152.63.41
WorkerStatus: WorkerStatus{state=Normal, stateStartTime=1724244049248}
NumOfSlotsAllocated: 100

I think this will still be noisy in a large fleet, especially in the driver logs where users might be looking for other logs/errors. I feel that if truly needed for debugging, the user can then enable debug logging for it.

Maybe print this log to one line can be helpful.

@FMX
Copy link
Contributor

FMX commented Aug 22, 2024

I think this change can make retrospective debugging hard as we will have no insight into which workers the shuffle data was written to and what was the behaviour of those workers during that time.
Instead of making it completely a debug log IMO we should at least output some basic info about the worker host selected for shuffle this will reduce the noise but also leave some space for debugging issues.
Ex –

Host: 10.152.63.41
WorkerStatus: WorkerStatus{state=Normal, stateStartTime=1724244049248}
NumOfSlotsAllocated: 100

I think this will still be noisy in a large fleet, especially in the driver logs where users might be looking for other logs/errors. I feel that if truly needed for debugging, the user can then enable debug logging for it.

Maybe print this log to one line can be helpful.

Setting the logger levels for components could be the ultimate solution.

Copy link
Contributor

@FMX FMX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Merged into main(v0.6.0), branch-0.5(v0.5.2), branch-0.4(v0.4.3).

@FMX FMX closed this in 1da2ede Aug 22, 2024
FMX pushed a commit that referenced this pull request Aug 22, 2024
### What changes were proposed in this pull request?
Change client side reserve slots log to debug.

### Why are the changes needed?
in a large fleet of workers such as ours, it is noisy for the user in the driver logs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #2700 from akpatnam25/CELEBORN-1573.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
(cherry picked from commit 1da2ede)
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
FMX pushed a commit that referenced this pull request Aug 22, 2024
### What changes were proposed in this pull request?
Change client side reserve slots log to debug.

### Why are the changes needed?
in a large fleet of workers such as ours, it is noisy for the user in the driver logs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes #2700 from akpatnam25/CELEBORN-1573.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
(cherry picked from commit 1da2ede)
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
@s0nskar
Copy link
Contributor

s0nskar commented Aug 22, 2024

Maybe print this log to one line can be helpful.

@FMX i created a PR sometime back to support single line logging – #2672. Let me wdyt about this, i'll extend it to other places.

wankunde pushed a commit to wankunde/celeborn that referenced this pull request Oct 11, 2024
### What changes were proposed in this pull request?
Change client side reserve slots log to debug.

### Why are the changes needed?
in a large fleet of workers such as ours, it is noisy for the user in the driver logs.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
N/A

Closes apache#2700 from akpatnam25/CELEBORN-1573.

Authored-by: Aravind Patnam <apatnam@linkedin.com>
Signed-off-by: mingji <fengmingxiao.fmx@alibaba-inc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants