Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improves relocations handling #768

Merged
merged 46 commits into from
Sep 7, 2018
Merged

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Feb 11, 2018

This PR reworks relocations handling in bap.
Previously, we had to support some invariants at different stages of project creation. And it was a kind of breaking of abstractions.
And now, after #761 fix, we finally can refactor it. So we rolled back all changes in a disassembler and lifting to IR.

Also, this PR changes a default brancher, so relocations will work out-of-box.

gitoleg added 18 commits April 2, 2018 12:02
Technicaly, we may add this module to plugin,
but will also have to update api pligin dependencies
in register_pass function, since api should be applied
after all relocations resolving
We still have a problem, that bil doesn't reflect
jump instruction destinations in case of relocations
presence. This leads to wrong IR program.

One of possible solutions is to add a whole program cfg
into sema lift functions for destinations search.

Once we did it, we don't need to think about local relocations
at all - only about external symbols
@gitoleg gitoleg requested a review from ivg April 5, 2018 16:47
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR requires a lot of work and has some dubious decisions.

Please walk through my comments and address them, and prepare a thorough description of what this PR is doing, why we need it, what and where it changes, and what are justifications for those changes.

open Bap_disasm_std

(** [run prg insn spec] performs search for relocations in [spec] and
add inject synthetic subroutines for each external function. Also
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"add inject" -> inserts or injects

I'm not sure, why you're using word injection for this procedure at all, for me injection is an injective mapping, i.e., when you map one set to another without losing information.

open Bap_sema.Std
open Bap_disasm_std

(** [run prg insn spec] performs search for relocations in [spec] and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

performs search = searches. Keep away from the bureaucratic style

Also, try to document functions, not in terms of how they are implemented, but in terms of what they do, i.e., in a more declarative style. Also, try to minimize the context that is required to understand what the function is doing.

The documentation shall answer the following questions:

  1. what is sought in the spec exactly (specify the attributes)
  2. what is inserted into the project (explain how functions are named, note that they are tagged with the [synthetic] attribute)
  3. what are the assumptions, if any.

@@ -330,7 +319,12 @@ let create_exn
let disasm = Disasm.create g in
let program = MVar.read program in
let program =
symbolize_synthetic program (Disasm.insns disasm) spec in
if MVar.is_updated spec then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are you injecting here, instead of putting the code in a plugin and using an external injection point? Are there any invariants on which the core code now depends and which are provided by this injector?

@@ -48,6 +49,9 @@ let reconstruct name roots cfg =
let find_block addr =
Cfg.nodes cfg |> Seq.find ~f:(fun blk ->
Addr.equal addr (Block.addr blk)) in
let find_callers blk =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function doesn't find anything but maps, the name is misleading.

@@ -48,6 +49,9 @@ let reconstruct name roots cfg =
let find_block addr =
Cfg.nodes cfg |> Seq.find ~f:(fun blk ->
Addr.equal addr (Block.addr blk)) in
let find_callers blk =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, input edges of a block are not "callers", they are called predecessors. Moreover, there is the [Cfg.Node.preds] function for that.

Please, learn the full interface of a module that you're using.

@@ -28,6 +28,7 @@ module Fn = Opaque.Make(struct
type t = {
addrs : fn Addr.Map.t;
names : fn String.Map.t;
calls : fn Addr.Map.t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the justification to put this into the Symtab? This adds a completely new responsibility for this data structure, and we need to review each function in its implementation that it preserves its invariants. For example, you at least broke the remove function. The mere fact that you didn't touch it means, that it is broken, as what will happen if you will remove a function that is in the calls?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Justification is:
we lose information about calls in reconstructor when we take apart a whole program cfg, that is very harmful in case of relocations. And the Symtab looks like a good place to store this information.
And we definitely need to review each function, what I could do incorrect, but remove function, as far as I see, removes calls, therefore I did touch it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you actually did touch it.
What do you mean by "harmful in case of relocations"?
And yes, I agree, that we lose information when we proceed from the whole program CFG to a set of function CFGs, and since we pass only symtab downstream we need to preserve this information.

I have a couple more questions:

Does the calls data structure relates callsites to the called function?
Is it guaranteed that domain of call is equal to the domain of addrs ? (I.e., that forall x, exists y, calls[x] = addrs[y])

let (~@) t = match addr with
| None -> t
| Some addr -> annotate_addr (annotate_insn t insn) addr in
let goto ?cond id =
`Jmp ~@(Ir_jmp.create_goto ?cond (Label.direct id)) in
let jump ?cond exp =
let target = Label.indirect exp in
let target = match dst with
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that doesn't look good or right. Even if it is correct, it should be rewritten in a good way.

The jump function takes exp and returns a corresponding term. Here, you're ignoring the exp if the [dst] parameter is passed. In this case, all jumps will lead to the same destination.

This might be correct if [dst] is passed only if the statement has at most one jump. But this is a kind of precondition that you shall never enforce. It is broken.

let blk cfg block : blk term list =
let find_dst block symtab =
if Insn.(is call) (Block.terminator block) then
Symtab.find_call symtab (Block.addr block) |>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A call usually refers to a call site, i.e., a place where the call occurs, e.g., the call assembly instruction, the call term in BIR, or a IRL call. In your case, the call is the Block.terminator which is a call.
What you're finding is called callee, I believe, though it is hard to understand, because there is no documentation for the new Symtab functions. And they are very ambiguous and easy to use incorrectly.

Fact.return (of_aseq width rels, of_aseq width ext)
external_symbols >>= fun exts ->
let to_addr = Addr.of_int64 ~width in
Fact.return (of_seq rels ~fkey:to_addr ~fdata:to_addr,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better to first map both sequences, and the use the normal of_aseq

let get mem data =
let rec find addr =
let rec loop addr =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find was a much better name

@ivg ivg assigned gitoleg and unassigned ivg Aug 17, 2018
Added to Symtab functions `add_callee` and `find_callee`.
It became possible to add a name of callee side in Reconstructor.
It's useful for detection of external calls while lifting to IR.
added to sema_lift a lookup of callees of
indirect calls
@gitoleg
Copy link
Contributor Author

gitoleg commented Aug 31, 2018

so, what do we do:

  1. remove the relocatable brancher, because we finally undestand that this guy actually fixes a behaviour of the main brancher, and we need just to fix main. So relocations handling is a part of his job now.

  2. add a relocatable symbolizer, that uses an information about relocations and instructions to provide names for external functions (by address of a caller side)

  3. in symtab we add add_callee and find_callee functions to store information about caller addresses and callees names.

  4. in reconstructor we add callees names to symtab: for each call we name the destination side. And if the destination is unknown (no output edges or only fall edge), we just name the caller's address. The latter is most likely occurs in case of external functions.

  5. in sema lifter we take a look at every indirect call. We check if there is a known callee name at the address of the jump (Symtab.find_callee). And if there is, then there are two options:

    • If there is a function with the given name in Symtab
      (Symtab.find_by_name), then we can say that this indirect jump can be replaced as a call to this function
    • If there is no such function, then we can say that there is a call to
      external and we create a synthetic subroutine then.

fix #858

Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a couple of small changes, otherwise good to go

| Ok rels, Ok exts -> {rels; exts}
| Ok rels, Error _ -> {rels; exts = Addr.Set.empty}
| Error _, Ok exts -> {rels = Addr.Map.empty; exts}
| _ -> empty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're silencing errors here, I don't like it, it might cost us a lot later.
Unfortunately the interface doesn't allow us to exhibit them directly, but we can just raise an exception, that would be caught in the brancher construction side.

Once we do this, we need to check also that all our construction places are safe.

| Error _, Ok exts -> {rels = Addr.Map.empty; exts}
| _ -> empty

let find mem get =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use Seq.range to create a range of addresses of a chunk of memory, and then use Seq.exits, e.g.,

let is_external start len= 
  Seq.exists ~f:(Set.mem t.exts) @@ Seq.init len ~f:(Addr.nsucc start)

Option.is_some @@
find mem (fun x -> if Set.mem t.exts x then Some x else None)

let find_internal t mem = find mem (Map.find t.rels)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, use seq.

@@ -20,3 +20,6 @@ val dominators : t -> mem -> fn list
val intersecting : t -> mem -> fn list
val to_sequence : t -> fn seq
val span : fn -> unit memmap

val add_callee : t -> addr -> string -> t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(* remembers a call to an external function from the given block *)
val add_external : t -> block -> string -> t

(* finds if there are any calls to an external symbol from the given block *)
val find_external : t -> addr -> string option

lib/bap_disasm/bap_disasm_reconstructor.ml Show resolved Hide resolved
fst |>
Memory.min_addr

let maybe_external blk cfg =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_unresolved might be a better name.

let add_callees syms name cfg blk =
if is_call blk then
let call_addr = terminator_addr blk in
if maybe_external blk cfg then
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're here assuming that any unresolved call is an external call. I think that it is too overly, we need somehow to limit the set of presumed externals. So, we will consider the call to be external if it has a resolved name, i.e., something that is not sub_XXX. Of course, we won't look at the name for that, instead we will extend the symbolizer interface with a new function is_resolved that will return true if a symbolizer returned Some name.

let's harden a resolving of external functions and
check indirect calls against addresses in symtab too:
if address is in symtab, then such call can't be external
@gitoleg gitoleg assigned ivg and unassigned gitoleg Sep 2, 2018
if has_jumps bil then
Bil.fold_consts bil |> List.concat_map ~f:dests_of_stmt
else []
Bil.fold_consts bil |> List.concat_map ~f:dests_of_stmt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need to do this after the BIL transformation pipeline PR is merged?

let () =
init ();
Config.manpage [
`S "SYNOPSIS";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is SYNOPSYS section empty?

Config.manpage [
`S "SYNOPSIS";
`S "DESCRIPTION";
`P "Provides a symbolizer for external symbols in relocatable files.";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`P "Extracts symbol information from the program relocations. "

`P "The relocation symbolizer leverages the relocation information stored in files
to extract symbol names. Since a relocation references an external symbol which
doesn't have an address we use an address of a callsite. "

`S "SEE ALSO";
`P "$(b,bap-plugin-llvm)(1) code";
];
Config.when_ready (fun _ -> ())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you shall call init on when_ready, so it should be Config.when_read init

@ivg ivg assigned gitoleg and unassigned ivg Sep 7, 2018
@ivg ivg merged commit 9c7444e into BinaryAnalysisPlatform:master Sep 7, 2018
@gitoleg gitoleg deleted the fix-rel branch February 1, 2019 19:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants