Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prunes unreachable code in the optimization pass #1095

Merged
merged 6 commits into from
May 20, 2020

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Apr 30, 2020

This PR fixes #1092 :
The optimization pass may make some code unreachable, propagating a constant value to a branch guard.

For example, some code can become unreachable due to jumps that are
never taken because of condition:

0000046a: when 0 goto %0000045b

Or due to the previous jump that is taken always:

00000471:
00000473: goto %00000451
00000474: goto %00000455

This PR fixes this problem and removes blocks that are unreachable due to infeasible jumps and these jumps as well.

Also, this PR fixes the computation of digest in the optimization pass. Previously, digest for each subroutine contained tids, that is not quite right, because a program can have different tids on each bap execution, and actually it happens quite often. And it leads to the cache explosion since instead of loading from cache we store some duplicated data that are different only by the tids values. This PR solves this problem and uses addresses instead.

gitoleg added 3 commits April 30, 2020 10:50
This PR fixes BinaryAnalysisPlatform#1092 :
The optimization pass may make some code unreachable, propagating a constant value to a branch guard.

For example, some code can become unreachable due to jumps that are
never taken because of condition:
```
0000046a: when 0 goto %0000045b
```

Or due to the previous jump that is taken always:
```
00000471:
00000473: goto %00000451
00000474: goto %00000455
```

This PR fixes this problem and removes blocks that are unreachable due to infeasible jumps and these jumps as well.

Also, this PR fixes the computation of digest in the optimization pass. Previously, digest for each subroutine contained tids, that is not quite right, because a program can have different tids on each bap execution, and actually it happens quite often. And it leads to the cache explosion since instead of loading from cache we store some duplicated data that are different only by the tids values. This PR solves this problem and uses addresses instead.
@gitoleg gitoleg requested a review from ivg April 30, 2020 15:27
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comments inline. Concerning the digesting mechanism, I suggest the following simple algorithm, the digest of a function is the digest of all its addresses. Collect the addresses into a set (so that the digest will not depend on the traversal order) and then compute the digest.

let dead_jmps_of_blk b =
Term.to_sequence jmp_t b |>
Seq.fold ~init:(Set.empty (module Tid), false)
~f:(fun (deads, is_taken) jmp ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_taken here is misleading, is_unreachable is better, I think

then G.Edge.remove edge g
else g)

let dead_blks sub g =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This algorithm is quadratic. You can use depth_first_search and select all blocks that do not belong to the tree that starts with G.start. This will be linear.

Term.map def_t b ~f:(update_def updates) |>
Term.map jmp_t ~f:(update_jmp updates))

let map_alive deads cls ?(f=ident) x =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a very misleading name, let's do filter_map_alive

Seq.iter (Term.to_sequence blk_t sub) ~f:(fun blk ->
match Term.get_attr blk address with
| None -> ()
| Some a -> Hashtbl.add_exn blks (Term.tid blk) a);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can have two blocks with the same address in general, e.g., the same instruction may yield more than one block. Yes, we assign the attribute to the entry block, but the rest of the blocks still belong to the same address/instruction.

end
end)#visit_sub sub (Digest.create ~namespace:"optimization") in
let digest = Digest.add digest "%s" (Sub.name sub) in
Digest.add digest "%s" (string_of_int level)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

???

@gitoleg gitoleg force-pushed the prune-dead-code branch from 98f8bce to 0ccea5d Compare May 13, 2020 18:00
@gitoleg gitoleg requested a review from ivg May 13, 2020 18:00
@ivg ivg merged commit 8530c06 into BinaryAnalysisPlatform:master May 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prune unreachable code in the optimization pass
2 participants