-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improves relocations handling #768
Conversation
Technicaly, we may add this module to plugin, but will also have to update api pligin dependencies in register_pass function, since api should be applied after all relocations resolving
Last commit for him
We still have a problem, that bil doesn't reflect jump instruction destinations in case of relocations presence. This leads to wrong IR program. One of possible solutions is to add a whole program cfg into sema lift functions for destinations search. Once we did it, we don't need to think about local relocations at all - only about external symbols
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR requires a lot of work and has some dubious decisions.
Please walk through my comments and address them, and prepare a thorough description of what this PR is doing, why we need it, what and where it changes, and what are justifications for those changes.
lib/bap/bap_inject_externals.mli
Outdated
open Bap_disasm_std | ||
|
||
(** [run prg insn spec] performs search for relocations in [spec] and | ||
add inject synthetic subroutines for each external function. Also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"add inject" -> inserts or injects
I'm not sure, why you're using word injection for this procedure at all, for me injection is an injective mapping, i.e., when you map one set to another without losing information.
lib/bap/bap_inject_externals.mli
Outdated
open Bap_sema.Std | ||
open Bap_disasm_std | ||
|
||
(** [run prg insn spec] performs search for relocations in [spec] and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
performs search = searches. Keep away from the bureaucratic style
Also, try to document functions, not in terms of how they are implemented, but in terms of what they do, i.e., in a more declarative style. Also, try to minimize the context that is required to understand what the function is doing.
The documentation shall answer the following questions:
- what is sought in the spec exactly (specify the attributes)
- what is inserted into the project (explain how functions are named, note that they are tagged with the [synthetic] attribute)
- what are the assumptions, if any.
lib/bap/bap_project.ml
Outdated
@@ -330,7 +319,12 @@ let create_exn | |||
let disasm = Disasm.create g in | |||
let program = MVar.read program in | |||
let program = | |||
symbolize_synthetic program (Disasm.insns disasm) spec in | |||
if MVar.is_updated spec then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are you injecting here, instead of putting the code in a plugin and using an external injection point? Are there any invariants on which the core code now depends and which are provided by this injector?
@@ -48,6 +49,9 @@ let reconstruct name roots cfg = | |||
let find_block addr = | |||
Cfg.nodes cfg |> Seq.find ~f:(fun blk -> | |||
Addr.equal addr (Block.addr blk)) in | |||
let find_callers blk = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this function doesn't find anything but maps, the name is misleading.
@@ -48,6 +49,9 @@ let reconstruct name roots cfg = | |||
let find_block addr = | |||
Cfg.nodes cfg |> Seq.find ~f:(fun blk -> | |||
Addr.equal addr (Block.addr blk)) in | |||
let find_callers blk = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, input edges of a block are not "callers", they are called predecessors. Moreover, there is the [Cfg.Node.preds]
function for that.
Please, learn the full interface of a module that you're using.
lib/bap_disasm/bap_disasm_symtab.ml
Outdated
@@ -28,6 +28,7 @@ module Fn = Opaque.Make(struct | |||
type t = { | |||
addrs : fn Addr.Map.t; | |||
names : fn String.Map.t; | |||
calls : fn Addr.Map.t; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the justification to put this into the Symtab? This adds a completely new responsibility for this data structure, and we need to review each function in its implementation that it preserves its invariants. For example, you at least broke the remove
function. The mere fact that you didn't touch it means, that it is broken, as what will happen if you will remove a function that is in the calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Justification is:
we lose information about calls in reconstructor when we take apart a whole program cfg, that is very harmful in case of relocations. And the Symtab
looks like a good place to store this information.
And we definitely need to review each function, what I could do incorrect, but remove
function, as far as I see, removes calls, therefore I did touch it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you actually did touch it.
What do you mean by "harmful in case of relocations"?
And yes, I agree, that we lose information when we proceed from the whole program CFG to a set of function CFGs, and since we pass only symtab downstream we need to preserve this information.
I have a couple more questions:
Does the calls
data structure relates callsites to the called function?
Is it guaranteed that domain of call is equal to the domain of addrs ? (I.e., that forall x, exists y, calls[x] = addrs[y])
lib/bap_sema/bap_sema_lift.ml
Outdated
let (~@) t = match addr with | ||
| None -> t | ||
| Some addr -> annotate_addr (annotate_insn t insn) addr in | ||
let goto ?cond id = | ||
`Jmp ~@(Ir_jmp.create_goto ?cond (Label.direct id)) in | ||
let jump ?cond exp = | ||
let target = Label.indirect exp in | ||
let target = match dst with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that doesn't look good or right. Even if it is correct, it should be rewritten in a good way.
The jump function takes exp and returns a corresponding term. Here, you're ignoring the exp if the [dst]
parameter is passed. In this case, all jumps will lead to the same destination.
This might be correct if [dst]
is passed only if the statement has at most one jump. But this is a kind of precondition that you shall never enforce. It is broken.
lib/bap_sema/bap_sema_lift.ml
Outdated
let blk cfg block : blk term list = | ||
let find_dst block symtab = | ||
if Insn.(is call) (Block.terminator block) then | ||
Symtab.find_call symtab (Block.addr block) |> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A call usually refers to a call site, i.e., a place where the call occurs, e.g., the call
assembly instruction, the call
term in BIR, or a IRL call. In your case, the call is the Block.terminator which is a call.
What you're finding is called callee, I believe, though it is hard to understand, because there is no documentation for the new Symtab functions. And they are very ambiguous and easy to use incorrectly.
plugins/relocatable/rel_brancher.ml
Outdated
Fact.return (of_aseq width rels, of_aseq width ext) | ||
external_symbols >>= fun exts -> | ||
let to_addr = Addr.of_int64 ~width in | ||
Fact.return (of_seq rels ~fkey:to_addr ~fdata:to_addr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would be better to first map both sequences, and the use the normal of_aseq
plugins/relocatable/rel_brancher.ml
Outdated
let get mem data = | ||
let rec find addr = | ||
let rec loop addr = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
find
was a much better name
Added to Symtab functions `add_callee` and `find_callee`. It became possible to add a name of callee side in Reconstructor. It's useful for detection of external calls while lifting to IR.
added to sema_lift a lookup of callees of indirect calls
so, what do we do:
fix #858 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a couple of small changes, otherwise good to go
| Ok rels, Ok exts -> {rels; exts} | ||
| Ok rels, Error _ -> {rels; exts = Addr.Set.empty} | ||
| Error _, Ok exts -> {rels = Addr.Map.empty; exts} | ||
| _ -> empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're silencing errors here, I don't like it, it might cost us a lot later.
Unfortunately the interface doesn't allow us to exhibit them directly, but we can just raise an exception, that would be caught in the brancher construction side.
Once we do this, we need to check also that all our construction places are safe.
| Error _, Ok exts -> {rels = Addr.Map.empty; exts} | ||
| _ -> empty | ||
|
||
let find mem get = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use Seq.range
to create a range of addresses of a chunk of memory, and then use Seq.exits
, e.g.,
let is_external start len=
Seq.exists ~f:(Set.mem t.exts) @@ Seq.init len ~f:(Addr.nsucc start)
Option.is_some @@ | ||
find mem (fun x -> if Set.mem t.exts x then Some x else None) | ||
|
||
let find_internal t mem = find mem (Map.find t.rels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above, use seq.
lib/bap_disasm/bap_disasm_symtab.mli
Outdated
@@ -20,3 +20,6 @@ val dominators : t -> mem -> fn list | |||
val intersecting : t -> mem -> fn list | |||
val to_sequence : t -> fn seq | |||
val span : fn -> unit memmap | |||
|
|||
val add_callee : t -> addr -> string -> t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(* remembers a call to an external function from the given block *)
val add_external : t -> block -> string -> t
(* finds if there are any calls to an external symbol from the given block *)
val find_external : t -> addr -> string option
fst |> | ||
Memory.min_addr | ||
|
||
let maybe_external blk cfg = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_unresolved
might be a better name.
let add_callees syms name cfg blk = | ||
if is_call blk then | ||
let call_addr = terminator_addr blk in | ||
if maybe_external blk cfg then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're here assuming that any unresolved call is an external call. I think that it is too overly, we need somehow to limit the set of presumed externals. So, we will consider the call to be external if it has a resolved name, i.e., something that is not sub_XXX
. Of course, we won't look at the name for that, instead we will extend the symbolizer interface with a new function is_resolved
that will return true if a symbolizer returned Some name
.
let's harden a resolving of external functions and check indirect calls against addresses in symtab too: if address is in symtab, then such call can't be external
if has_jumps bil then | ||
Bil.fold_consts bil |> List.concat_map ~f:dests_of_stmt | ||
else [] | ||
Bil.fold_consts bil |> List.concat_map ~f:dests_of_stmt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we really need to do this after the BIL transformation pipeline PR is merged?
let () = | ||
init (); | ||
Config.manpage [ | ||
`S "SYNOPSIS"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is SYNOPSYS
section empty?
Config.manpage [ | ||
`S "SYNOPSIS"; | ||
`S "DESCRIPTION"; | ||
`P "Provides a symbolizer for external symbols in relocatable files."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`P "Extracts symbol information from the program relocations. "
`P "The relocation symbolizer leverages the relocation information stored in files
to extract symbol names. Since a relocation references an external symbol which
doesn't have an address we use an address of a callsite. "
`S "SEE ALSO"; | ||
`P "$(b,bap-plugin-llvm)(1) code"; | ||
]; | ||
Config.when_ready (fun _ -> ()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you shall call init
on when_ready
, so it should be Config.when_read init
This PR reworks relocations handling in bap.
Previously, we had to support some invariants at different stages of project creation. And it was a kind of breaking of abstractions.
And now, after #761 fix, we finally can refactor it. So we rolled back all changes in a disassembler and lifting to IR.
Also, this PR changes a default brancher, so relocations will work out-of-box.