-
Notifications
You must be signed in to change notification settings - Fork 348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add more @FormatMethod annotation in Logger #954
base: master
Are you sure you want to change the base?
Add more @FormatMethod annotation in Logger #954
Conversation
@@ -74,6 +75,7 @@ public void debug(Throwable exception, String message) | |||
* | |||
* @param message a literal message to log | |||
*/ | |||
@FormatMethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does log.debug("%%")
log single percent or two?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the caller allowed to call log.debug("%...a malformed format string")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I haven't thought of that :/ No, the %
s won't be interpreted by these methods, but Error Prone will treat them as format strings regardless. So, back to the drawing board I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed an update, with the following caveat:
(...) from now on the message on these methods will be actually treated as format string (...) This is a breaking change, but it makes treatment of the messages consistent across all variants of these log methods.
Are we willing to accept that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i admit this is a grey area. log.xxx(String)
doesn't need to format, since there are no args, but could do format for consistency. however, it's not IMO OK to make a breaking change like that.
4d88940
to
f52ec26
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not format methods. Making them so is a breaking change to an API that has existed for over 10 years. I like the idea of catching misuse, but not at the expense of breaking logging which is often not tested or only happens in exceptional circumstances.
OK, thanks. Just wanted to test the waters, so to speak :) |
Do you have any examples of misuse seen in practice? I’m wondering if we can do something clever here. There are two kinds of callers:
Maybe we change so that that the message is now a format string, but if it’s malformed, we simply log the original (old behavior). This would only be for the no-arg version. It makes usage slightly uglier, but makes the API consistent and allows checkers. cc @martint |
I think there are about 30 in Trino. They are in four general categories:
Overall, nothing serious, "just" a code quality issue. I proposed to mark these methods as format methods, but I am aware that this is an indirect way to make Error Prone do what we want. There are other tools we could use, for example SpotBugs (with some external checks) can detect mis-use of logger methods, but I don't think any of them knows that these are actually logger methods. So we still have to annotate them somehow, and this seems like it's as good as any other way to do that. So given the above the question now is: is it OK that we mark these methods as format methods even though they're not really? :) |
these are valid and shouldn't be prohibited still, workaround is easy: maybe
|
The no-argument versions are really useful and common, so it would suck to have to use a dummy
I think it is fine to say that these are no-arg format string messages, such that the dynamic In other words, if we have |
It's not only a workaround - if the message is to be interpreted as a format string, then it prevents that from happening in case the OTOH,
These won't get flagged, because these are literal strings and literal strings are explicitly OK (unless they contain malformed format string). And lastly,
I know this sucks to have to repeat onself. For example String message = format("User [%s] not a member of an authorized group", user);
log.debug(message);
throw new AccessDeniedException(message); will have to change into log.debug("User [%s] not a member of an authorized group", user);
throw new AccessDeniedException(format("User [%s] not a member of an authorized group", user)); But I don't think it's such a big deal, personally. |
And one more thing - I haven't actually found any cases where the message contains something that might be misinterpreted as a format string. Not in Trino, at least. And not in static strings. It may still happen when we log a result of some invocation, though. |
f52ec26
to
da3e952
Compare
In this new version:
|
bump |
One reason to treat no-param logs as format strings is when we want to merge subsequent logs into one, so that they are both logged atomically: - log.info("trying to give consent");
- log.info("waiting for openId checkbox");
+ log.info("trying to give consent%nwaiting for openId checkbox"); The |
cd8e789
to
b73347b
Compare
This seems reasonable to me. I do like the consistency, and the fallback of catching Can you add a test for including valid placeholders like |
To expand on the motivation for this change, it allows Error Prone to flag code like this: log.debug("Task scheduled " + taskId);
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also update the CHANGES
file in the commit that makes the behavior change?
1333083
to
e06f66b
Compare
@electrum AC, thanks! |
These all use variants of the `Logger` method which take a `String` and no format string arguments. Formerly this was treated as a simple message to be passed to logs verbatim, but this creates plenty of opportunities to mis-use it. A change in Airlift may change the semantics of these methods to treat the `String` as a format string with no arguments, and mark them as `@FormatMethod`s, so that Error Prone can catch improper uses (see airlift/airlift#954). There are roughly four categories of instances flagged given the above: * Logging using concatenation instead of the format string: - `logger.info("Foo " + bar + " bazzed");` * Logging a pre-formatted String: - `logger.info(String.format("%s %d", foo, bar));` * Preparing a message beforehand so that it can be used to log and as an exception message: - ``` String message = String.format(...); logger.info(message); throw new Exception(message); ``` * Non-static-string: - `log.error(message.toString());` Most of the above have pretty obvious fixes, except for the third kind, where we avoid duplication of the message string by using the universal "%s" format string to just output the preformatted message.
I'm not yet 100% convinced this is something we want to change. The choice of making the no-args version a regular string (vs a format string) was intentional. It is true that "they may not look like format methods, but in fact they are, just without parameters". That was the original intention (except for the implications around how naked It makes it more cumbersome to log naked strings that contain '%'. With this change, callers are forced to do Also, I don't like the implications and constraints this forces. In particular, while things like |
@martint see also #954 (comment)
I agree with "not catastrophic". Anyway, the problem we're solving is: "how to find |
These all use variants of the `Logger` method which take a `String` and no format string arguments. Formerly this was treated as a simple message to be passed to logs verbatim, but this creates plenty of opportunities to mis-use it. A change in Airlift may change the semantics of these methods to treat the `String` as a format string with no arguments, and mark them as `@FormatMethod`s, so that Error Prone can catch improper uses (see airlift/airlift#954). There are roughly four categories of instances flagged given the above: * Logging using concatenation instead of the format string: - `logger.info("Foo " + bar + " bazzed");` * Logging a pre-formatted String: - `logger.info(String.format("%s %d", foo, bar));` * Preparing a message beforehand so that it can be used to log and as an exception message: - ``` String message = String.format(...); logger.info(message); throw new Exception(message); ``` * Non-static-string: - `log.error(message.toString());` Most of the above have pretty obvious fixes, except for the third kind, where we avoid duplication of the message string by using the universal "%s" format string to just output the preformatted message.
Related, we should add supplier versions of all the no-arg logger methods, for times when you need to log the result of a method call or other expression. I think that would largely eliminate the objection, since it's very rare you need to log an arbitrary pre-constructed string. While I understand @martint's argument, I think it's rare enough that the API consistency and Error Prone check makes the extra verbosity in those cases worthwhile. |
@ksobolew There are a few places in Airlift to update before this change, which are examples of arbitrary strings passed to the no-arg methods:
|
e06f66b
to
5faf415
Compare
@electrum AC, thanks!
You're absolutely right, we should first fix the potentially problematic calls in Airlift itself. I included a very simple Maven profile which enables the relevant Error Prone check to make finding these cases easier. I fixed all the instances which it flagged, which include all the examples you provided. |
5faf415
to
9757c71
Compare
Currently the message is intepreted literally.
The message returned from the `Supplier` is not considered a format string - it's assumed that it'll be preformatted by the caller. This allows us also to re-use these method to implement some of the other variants. Note that we can also conveniently use the `Supplier`-taking methods from the underlying library.
The `Logger` methods which take `String` only and no format arguments, will be treated in the future by the logger as format strings, which is unexpected if the message comes from outside and is not intended to be interpreted in this way. We either have to provide the format string explicitly, or use the `Supplier` variant. The latter also has performance benefits in case the particular log level is not enabled. This is guided by Error Prone, for which, proactively, a Maven profile is added. It enables only the `FormatStringAnnotation` check, so it's very tightly focused for now.
They may not look like format methods, but in fact they are, just without parameters. The reason to treat them this way is that then Error Prone can detect mis-use of these methods, like passing a String concatenation instead of a format string plus arguments. The caveat is that from now on the message on these methods will be actually treated as format string, so any `%` characters will be interpreted. This will break messages like ``` Frobnicity increased by 10%! ``` which would have to be changed to ``` Frobnicity increased by 10%%! ``` This is a breaking change, but it makes treatment of the messages consistent across all variants of these log methods. For compatibility with existing code, if there is `IllegalFormatException` thrown by `String#format`, we fall back to treating the string as a literal message.
9757c71
to
efb9fef
Compare
What about:
That a seems pretty legit usage to me, and forcing In fact, such a usage now would become quite dangerous. If, for instance,
Can you elaborate? Not sure how it solves the problem. Also, using capturing lambdas can have performance implications, as they are constructed eagerly before the method gets called. It'd be trading one problem (string concatenation) for another one (creating capturing lambdas before the call).
I personally use that often enough during development or debugging that I wouldn't want it to get in the way. It's not just about the code that will end up merged, but throwaway code you write/build/run while prototyping and developing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per comment above, code like this has worse problems that those caused string concatenation that this change is attempting to catch.
String result = foo(...); // function which returns a string containing "%1000000000%"
log.debug(result);
I agree that this is potentially dangerous. On the other hand, this will definitely be flagged by Error Prone. On the third hand, Error Prone is not a silver bullet as it takes effort to enable it, and in projects like Trino it's not even enabled by default. So, yeah, this is a hard balance. Personally I'm biased towards more static analysis, even at the cost of some inconvenience, but I understand the hesitation. |
The result of the lambda call is not interpreted in any way, including not as a format pattern, so this is one way to fix the problem of potentially dangerous strings. As an added benefit, it reduces overhead in some cases, as the message construction can be encapsulated and executed only when needed. Regarding the performance concerns, the use case for lambda-taking methods is in places where creating the string is way more costly than setting up a lambda (which I don't believe have any measurable cost anyway, but that's just an aside). It's not for cases where you have a string that you'll want to use for other purposes as well, only for those where the message is constructed solely for the logger call. As this PR demonstrates, there are two such places in Airlift alone. |
How will Error Prone know to flag that? How would it ever be able to tell that the output from |
Well, it's not smart enough to track "tainted" (or "unsanitized") values, but it sees that the |
Well, that sucks :) Also, one big issue is that this assumes you are using ErrorProne. Many projects that depend on Airlift don't, and it's unreasonable to force it on them. Without checks that prevent using the method with anything other than a literal string, this new reimagining of the method is more dangerous, as I described above. |
You're right, you have a responsibility for not breaking projects using Airlift. This means it's not possible to merge it in the current form, and it's fine. Backward compatibility is difficult :) I still think that detecting more issues is a worthy goal, so I'm thinking of a way to enable it. My first thought was to add methods like |
They may not look like format methods, but in fact they are, just without parameters. The reason to treat them this way is that then Error Prone can detect mis-use of these methods, like passing a String concatenation instead of a format string plus arguments.
The caveat is that from now on the message on these methods will be actually treated as format string, so any
%
characters will be interpreted. This will break messages likewhich would have to be changed to
This is a breaking change, but it makes treatment of the messages consistent across all variants of these log methods. For compatibility with existing code, if there is
IllegalFormatException
thrown byString#format
, we fall back to treating the string as a literal message.This PR includes dogfooding of this change on Airlift itself. There is a simple Maven profile provided which enables the relevant check (and only that).