Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unify DEBUG_BPF logging to be handled in libbpf #1433

Merged
merged 2 commits into from
Nov 8, 2017

Conversation

palmtenor
Copy link
Member

@palmtenor palmtenor commented Nov 4, 2017

In BCC we have a few debug flags. Except for DEBUG_BPF, all other debug flags are handled at ClangLoader / Clang / LLVM layer.

DEBUG_BPF is for log message come from BPF_PROG_LOAD syscall, and it's current logic is a bit messy:

  • libbpf's bpf_prog_load enables this logging (log_level = 1) whenever non-empty log_buf is provided. On the other hand, the Python API always provide such buffer, causing unnecessary work when debug is not enabled.
  • Python API has its own logic to print the message (only when DEBUG_BPF is enabled), and to incrementally double the size of the log buffer if the buffer size is insufficient.
  • On the other hand, libbpf's bpf_prog_load also has the logic to incrementally double the size of the log buffer if the buffer size is insufficient, but it only does it on error, and will print the log message along with some hints. However, it will not copy this buffer to user provided log_buf even if provided.
  • C++ and Lua API doesn't have the logic to handle DEBUG_BPF flag, and if we want, we'll need to replicate the printing logic as Python does.

This PR tries to address these issues:

  • Added a new bpf_prog_load_flag interface. Instead of log_buf and log_buf_size, it simply takes a log_level argument. When setting to 1, it would automatically create temporary log buffer, increase the size if not sufficient, and print the log message with hints. This way, Python / C++ / Lua / etc. APIs won't need to handle these logic separately, and only need to pass the log_level flag correctly. This behavior is more consistent with other debug flags such as DEBUG_SOURCE, etc.
  • The original bpf_prog_load is a very commonly used interface and should not break for this non-critical feature change. So its been kept and behavior remain unchanged, except it will not do the auto-increment of log buffer anymore.

The new interface is applied in Python, C++ and Lua API, and it worked well when tested with different debug flag options.

buffer_size = LOG_BUF_SIZE;
// caller did not specify log_buf but failure should be printed,
// so repeat the syscall and print the result to stderr
// User did not provide log buffer. We will try to increase size of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want to check whether the error is caused by insufficient buffer or not here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, we already know the load syscall failed, and user did not provide a buffer. Regardless of the reason, we will try allocate temporary buffer size from LOG_BUF_SIZE, and increase it until ENOSPC no longer happens.

At the worse case is user have specified log_level = 1 and we already tried temporary buffer size being LOG_BUF_SIZE and it turned out not to be enough. In such case, we just do an extra iteration of attempted load. It only happens on error and if we want to get around that this part of logic would be more complicated. I don't feel it's worth it......

src/cc/libbpf.c Outdated
return bpf_prog_load_impl(prog_type, name, insns, prog_len, license,
kern_version, NULL, 0, log_level);
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now we have two APIs, bpf_prog_load and bpf_prog_load_flag. I think this is not a good idea. Can we consolidate both into one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can. But as I stated in the description, for folks using C API directly, bpf_prog_load almost certainly going to be used and if we change its function signature, all those call sites would break. That's why I added the new interface. I would mark the old one with DEPRECATED attribute. What do you think?

break

if self.debug & DEBUG_BPF and log_buf.value:
print(log_buf.value.decode(), file=sys.stderr)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether we have issue or not. The log buffer passed back to python and python can print out. If you printout in C/C++, not sure whether it will actually print out. I have numerous cases where when running python tools, if I added some fprintf in C++ and I did not see the output in the screen, could you double check this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I did. I tried both a few BCC tools, and FB internal build using Python logging library and everything works well. If you have example that this happens I could help debug.

if (tmp_log_buf)
free(tmp_log_buf);
tmp_log_buf_size = LOG_BUF_SIZE;
attr.log_level = 1;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more thing. Do we want integer log_level or boolean do_logging? Providing log_level is more flexible for users as log_level=2 can print out more info than log_level=1 in kernel. Maybe the above line needs to inherit user log_level?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. log_level is already integer, I will do another patch to add a new flag to do log_level=2 for more detailed register state inspection.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If user_buf != NULL, we will have
if (input_log_level == 0)
log_level = 1
else
log_level = input_log_level

Can we do the same thing here?
attr.log_level = (log_level == 0) ? 1 : log_level.

This will cover log_level=2 case for C++ API, where people are more expert...
For Python side, we have log_level=1 based on debug flag settting, I feel this is good enough.
log_level=2 prints reg state for every insn, this is mostly useful to track verifier bug and often happens with bigger programs (C++ side) than python side.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thing enabling log_level based on that user provided log_buf makes the logic is a bit messy. In brief words, the implemented logic in this patch is:

  • If user specified positive log_level, use the value for log_level.
  • Otherwise if load failed, set log_level to be 1.
  • When log_level enabled due to any above case:
    • If user provided log_buf, use user's log_buf.
    • Otherwise allocate use our temporary log buffer.

Do you feel that's a reasonable logic?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds okay, even with "Otherwise if load failed, set log_level to be 1". This is probably what we did earlier. Could you add this brief algorithm as the comments?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yonghong-song Yeah I was trying to keep current behavior mostly unchanged. Documentation is added on libbpf.h, do you have any suggestion on how to make it better?

Copy link
Collaborator

@yonghong-song yonghong-song left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Lets keep the existing load_prog error reporting mechanism. It is very useful to people even if they do not specify debug flag.

@4ast 4ast merged commit 2d33f35 into iovisor:master Nov 8, 2017
@palmtenor palmtenor deleted the logmsg branch November 8, 2017 06:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants