-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify DEBUG_BPF logging to be handled in libbpf #1433
Conversation
4df8a31
to
ebe63ea
Compare
buffer_size = LOG_BUF_SIZE; | ||
// caller did not specify log_buf but failure should be printed, | ||
// so repeat the syscall and print the result to stderr | ||
// User did not provide log buffer. We will try to increase size of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to check whether the error is caused by insufficient buffer or not here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this point, we already know the load syscall failed, and user did not provide a buffer. Regardless of the reason, we will try allocate temporary buffer size from LOG_BUF_SIZE
, and increase it until ENOSPC
no longer happens.
At the worse case is user have specified log_level = 1
and we already tried temporary buffer size being LOG_BUF_SIZE
and it turned out not to be enough. In such case, we just do an extra iteration of attempted load. It only happens on error and if we want to get around that this part of logic would be more complicated. I don't feel it's worth it......
src/cc/libbpf.c
Outdated
return bpf_prog_load_impl(prog_type, name, insns, prog_len, license, | ||
kern_version, NULL, 0, log_level); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now we have two APIs, bpf_prog_load and bpf_prog_load_flag. I think this is not a good idea. Can we consolidate both into one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can. But as I stated in the description, for folks using C API directly, bpf_prog_load
almost certainly going to be used and if we change its function signature, all those call sites would break. That's why I added the new interface. I would mark the old one with DEPRECATED
attribute. What do you think?
break | ||
|
||
if self.debug & DEBUG_BPF and log_buf.value: | ||
print(log_buf.value.decode(), file=sys.stderr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure whether we have issue or not. The log buffer passed back to python and python can print out. If you printout in C/C++, not sure whether it will actually print out. I have numerous cases where when running python tools, if I added some fprintf in C++ and I did not see the output in the screen, could you double check this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I did. I tried both a few BCC tools, and FB internal build using Python logging
library and everything works well. If you have example that this happens I could help debug.
if (tmp_log_buf) | ||
free(tmp_log_buf); | ||
tmp_log_buf_size = LOG_BUF_SIZE; | ||
attr.log_level = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One more thing. Do we want integer log_level or boolean do_logging? Providing log_level is more flexible for users as log_level=2 can print out more info than log_level=1 in kernel. Maybe the above line needs to inherit user log_level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. log_level
is already integer, I will do another patch to add a new flag to do log_level=2
for more detailed register state inspection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If user_buf != NULL, we will have
if (input_log_level == 0)
log_level = 1
else
log_level = input_log_level
Can we do the same thing here?
attr.log_level = (log_level == 0) ? 1 : log_level.
This will cover log_level=2 case for C++ API, where people are more expert...
For Python side, we have log_level=1 based on debug flag settting, I feel this is good enough.
log_level=2 prints reg state for every insn, this is mostly useful to track verifier bug and often happens with bigger programs (C++ side) than python side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thing enabling log_level
based on that user provided log_buf
makes the logic is a bit messy. In brief words, the implemented logic in this patch is:
- If user specified positive
log_level
, use the value forlog_level
. - Otherwise if load failed, set
log_level
to be 1. - When
log_level
enabled due to any above case:- If user provided
log_buf
, use user'slog_buf
. - Otherwise allocate use our temporary log buffer.
- If user provided
Do you feel that's a reasonable logic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds okay, even with "Otherwise if load failed, set log_level to be 1". This is probably what we did earlier. Could you add this brief algorithm as the comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yonghong-song Yeah I was trying to keep current behavior mostly unchanged. Documentation is added on libbpf.h
, do you have any suggestion on how to make it better?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Lets keep the existing load_prog error reporting mechanism. It is very useful to people even if they do not specify debug flag.
In BCC we have a few debug flags. Except for
DEBUG_BPF
, all other debug flags are handled atClangLoader
/ Clang / LLVM layer.DEBUG_BPF
is for log message come fromBPF_PROG_LOAD
syscall, and it's current logic is a bit messy:libbpf
'sbpf_prog_load
enables this logging (log_level = 1
) whenever non-emptylog_buf
is provided. On the other hand, the Python API always provide such buffer, causing unnecessary work when debug is not enabled.DEBUG_BPF
is enabled), and to incrementally double the size of the log buffer if the buffer size is insufficient.libbpf
'sbpf_prog_load
also has the logic to incrementally double the size of the log buffer if the buffer size is insufficient, but it only does it on error, and will print the log message along with some hints. However, it will not copy this buffer to user providedlog_buf
even if provided.DEBUG_BPF
flag, and if we want, we'll need to replicate the printing logic as Python does.This PR tries to address these issues:
bpf_prog_load_flag
interface. Instead oflog_buf
andlog_buf_size
, it simply takes alog_level
argument. When setting to 1, it would automatically create temporary log buffer, increase the size if not sufficient, and print the log message with hints. This way, Python / C++ / Lua / etc. APIs won't need to handle these logic separately, and only need to pass thelog_level
flag correctly. This behavior is more consistent with other debug flags such asDEBUG_SOURCE
, etc.bpf_prog_load
is a very commonly used interface and should not break for this non-critical feature change. So its been kept and behavior remain unchanged, except it will not do the auto-increment of log buffer anymore.The new interface is applied in Python, C++ and Lua API, and it worked well when tested with different debug flag options.