-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prunes unreachable code in the optimization pass #1095
Conversation
This PR fixes BinaryAnalysisPlatform#1092 : The optimization pass may make some code unreachable, propagating a constant value to a branch guard. For example, some code can become unreachable due to jumps that are never taken because of condition: ``` 0000046a: when 0 goto %0000045b ``` Or due to the previous jump that is taken always: ``` 00000471: 00000473: goto %00000451 00000474: goto %00000455 ``` This PR fixes this problem and removes blocks that are unreachable due to infeasible jumps and these jumps as well. Also, this PR fixes the computation of digest in the optimization pass. Previously, digest for each subroutine contained tids, that is not quite right, because a program can have different tids on each bap execution, and actually it happens quite often. And it leads to the cache explosion since instead of loading from cache we store some duplicated data that are different only by the tids values. This PR solves this problem and uses addresses instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comments inline. Concerning the digesting mechanism, I suggest the following simple algorithm, the digest of a function is the digest of all its addresses. Collect the addresses into a set (so that the digest will not depend on the traversal order) and then compute the digest.
let dead_jmps_of_blk b = | ||
Term.to_sequence jmp_t b |> | ||
Seq.fold ~init:(Set.empty (module Tid), false) | ||
~f:(fun (deads, is_taken) jmp -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_taken
here is misleading, is_unreachable
is better, I think
then G.Edge.remove edge g | ||
else g) | ||
|
||
let dead_blks sub g = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This algorithm is quadratic. You can use depth_first_search
and select all blocks that do not belong to the tree that starts with G.start
. This will be linear.
Term.map def_t b ~f:(update_def updates) |> | ||
Term.map jmp_t ~f:(update_jmp updates)) | ||
|
||
let map_alive deads cls ?(f=ident) x = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a very misleading name, let's do filter_map_alive
Seq.iter (Term.to_sequence blk_t sub) ~f:(fun blk -> | ||
match Term.get_attr blk address with | ||
| None -> () | ||
| Some a -> Hashtbl.add_exn blks (Term.tid blk) a); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have two blocks with the same address in general, e.g., the same instruction may yield more than one block. Yes, we assign the attribute to the entry block, but the rest of the blocks still belong to the same address/instruction.
end | ||
end)#visit_sub sub (Digest.create ~namespace:"optimization") in | ||
let digest = Digest.add digest "%s" (Sub.name sub) in | ||
Digest.add digest "%s" (string_of_int level) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
???
This PR fixes #1092 :
The optimization pass may make some code unreachable, propagating a constant value to a branch guard.
For example, some code can become unreachable due to jumps that are
never taken because of condition:
Or due to the previous jump that is taken always:
This PR fixes this problem and removes blocks that are unreachable due to infeasible jumps and these jumps as well.
Also, this PR fixes the computation of digest in the optimization pass. Previously, digest for each subroutine contained tids, that is not quite right, because a program can have different tids on each bap execution, and actually it happens quite often. And it leads to the cache explosion since instead of loading from cache we store some duplicated data that are different only by the tids values. This PR solves this problem and uses addresses instead.