Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds the stub resolver abi pass #1036

Merged
merged 30 commits into from
Apr 17, 2020

Conversation

gitoleg
Copy link
Contributor

@gitoleg gitoleg commented Jan 22, 2020

The stub resolver pass redirects on the IR level calls to stub functions to the calls to real functions if such could be identified. A function is a stub if it has the stub attribute present in the knowledge base. So far, we add only a knowledge provider for ELF stubs (that looks into the .plt section) others will be added later. The redirection algorithm establishes a one-to-one mapping between a stub and its prospective redirection. In case if a non-bijective mapping is detected (a stub has more than one implementation candidates or implementation has more than one stubs) no redirection is established.

The problem arises when we deal with libraries: even if a callee is
in the same library, a call will be done through a plt entry first, meaning that
we won't visit the target subroutine.

Our solution is based on the function name: we would better to call the
function itself, but not the corresponding plt entry, so since a
function and a plt entry has the same name, then any call to a plt entry
we substitute to a call to the function.

Also, this PR fixes SP in the `primus-x86-loader` (again).

The history.

PR BinaryAnalysisPlatform#993 introduced a proper initialization of PLT entries, but
unfortunately, introduced a bug as well.

Briefly, we didn't want primus to dive deep into PLT entries because
anyway, we would fail with unresolved call exception, but to that moment
a SP would be changed several times.
The solution was to raise an unresolved call exception as soon as
possible, right after a primus machine entered an entry and restore SP
by `unresolved-call` observation.
Unfortunately, the implementation was wrong: we linked an unresolved
call handler to the address of a jump in a plt entry, so actually we
just call a handler instead, and no actual unresolved call happens, no
observation made, and no steps to restore SP taken.

This PR fixes it and now we just unlink all the addresses of plt
entries, so we get a fair unresolved call exception.
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I like the idea of unlinking PLT and relinking jumps to implementation. I am a little bit concerned that this is not a pass, like trivial-condition-form for example, and I am not quite happy that Primus run modifies the program. So we should consider moving this to a normal pass (in fact the abi pass), as what it does is called relocating a procedure that in general useful not only for Primus.

So, I'm suggesting to split the relocation part into an abi pass.

Concerning the rest of implementation, aka component Component. We will discuss it more in depth on the second round, I have suspicions that your initial problem evaluation has some flaws. I'm currently revisiting the old PRs.

plugins/primus_x86/primus_x86_loader.ml Outdated Show resolved Hide resolved
plugins/primus_x86/primus_x86_loader.ml Outdated Show resolved Hide resolved
plugins/primus_x86/primus_x86_loader.ml Outdated Show resolved Hide resolved
plugins/primus_x86/primus_x86_loader.ml Outdated Show resolved Hide resolved
plugins/primus_x86/primus_x86_loader.ml Outdated Show resolved Hide resolved
@@ -3,7 +3,8 @@ open Bap_primus.Std
include Self()

let init _ =
Primus.Machine.add_component (module Primus_x86_loader.Component);;
Primus.Machine.add_component (module Primus_x86_loader.Component);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a component named component?

@ivg
Copy link
Member

ivg commented Jan 24, 2020

Unfortunately, the implementation was wrong: we linked an unresolved call handler to the address of a jump in a plt entry, so actually we
just call a handler instead, and no actual unresolved call happens, no observation made, and no steps to restore SP taken.

That is not true :) Observations had happened by that time, therefore the code was correct. Weneed to reevaluate the problem and confirm that observations indeed do not happen and if they not identify why. My personal bet is that they do happen :)

@gitoleg gitoleg force-pushed the rectifies-x86-loader branch from b5c5121 to 4824890 Compare February 13, 2020 00:30
ivg and others added 3 commits February 19, 2020 12:56
In BinaryAnalysisPlatform#1051 we begun inserting `start` and `exit` pseudonodes and
introduced a small bug, when the graph is singleton and has a node
with both indegree and outdegree equal to zero. For that graphs,
instead of
```
digraph {
start -> n -> exit
}
```
we got
```
digraph {
start -> n;
start -> exit;
}
```

since we didn't insert an edge from `n` to `exit`.
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. The algorithm should do the following:
    For each call that has a destination that is a stub, s.t. there is a function that is not a stub that we can unambiguously link with that stub, relink the destination to that function.
    Keep in mind, that for each stub there should be at most one function that matches this stub. And for each non-stub there should be at most one stub that matches this function. In case of any ambiguity, i..e, a function that can satisfy more than one stubs should be removed from the linking process, the same a stub that satisfy more than one functions, shouldn't be relinked.
  1. write knowledge providers that will provide the stub attribute for known to you stubs.

plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/stub_resolver/stub_resolver_main.ml Outdated Show resolved Hide resolved
plugins/primus_x86/primus_x86_loader.ml Outdated Show resolved Hide resolved
@gitoleg gitoleg changed the title rectifies x86 primus loader adds the stub resolver abi pass Apr 13, 2020
Copy link
Member

@ivg ivg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please address the issues that will be relevant after you will rewrite this to enable the announced property that the rewriting preserves one-to-one correspondence between stubs and implementation:

In case if a non-bijective mapping is detected (a stub has more than one implementation candidates or implementation has more than one stubs) no redirection is established.

For that, implement the relinker using the unification algorithm: start by partitioning the set of program subroutines by the equivalence relation E in which two functions are equivalent if they share the same alias. Then refine this quotient set by removing all sets that do not have exactly two elements. Finally, leave only those pairs in which one and only one element is a stub. Those pairs will give the most general unifier.

let update prog = relink prog (find_links prog)

let main proj =
Plt.provide proj;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be removed from the abi pass. In other words, move this line to line 148.

open KB.Syntax

type t = {
aliases : tid option String.Map.t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type binding = Unique of tid | Ambiguous instead of tid option so that at the end you will have a tri-state binding - either None, Some (Unique x), or Some Ambiguous .

Maps with 'a option as a value is very confusing, as you have the None constructor overloaded to have two completely different meanings. And that in Some None, None meanys too many. Something is wrong in representing "too many" with None.

open KB.Syntax

type t = {
aliases : tid option String.Map.t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, rename aliases to stubs and symbols to reals or impls to reflect the actual meaning of both fields.

| Second _ -> jmp
| First tid -> match Map.find links tid with
| Some (Some tid') ->
Jmp.reify
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't create a new jmp term as we will lose as a result al the attributes of the old term. Map the old term using the with_alt function. That is not exposed in the interface, so please expose it also (together with the with_dst function).

@ivg
Copy link
Member

ivg commented Apr 14, 2020

Also, please add unit tests and function tests (with a .so that has functions with external and static linkage calling each other).

Also, make sure that examples from draperlaboratory/cbat_tools#132 are linked correctly.


(** [with_dst jmp d] updates jump's intra-procedural destination *)
val with_dst : t -> dst option -> t

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you also add @since attributes? since 2.0.0 for with_{kind,cond} and 2.10 for the latter two.

@@ -1 +1,3 @@
int __libc_start_main(int (*main) (int, char **, char **), int, char **, void *auxv);

void __stack_chk_fail(void) __attribute__((noreturn));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is it here?

Stub_resolver_tests.suite;
]

let load_plugins () =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this

type t = {
groups : groups;
names : names;
next : int;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a side note, a trick that I usually employ is that instead of maintaining the counter, we can use Map.max_elt instead of next. But you can keep next if you like.

]
~expected:["siera", "papa";
"tango", "quebec";
"uniform", "romeo";];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need cases where symbols have many aliases, some intersecting aliases and some not, but where there is more than two cases, e.g.,

test "several intersections" [
  stub, "a",  ["a1"; "a2"; "a3"];
  real, "a0", ["a"];
  stub, "b",  ["b1"; "b2"; "b3"];
  real, "b0", ["b"; "c"];
  real, "c",  ["d"];
  stub, "d",  [];
] ~expect:[
  "a", "a0";
]

and more tests like this.

plugins/stub_resolver/stub_resolver_tests.ml Outdated Show resolved Hide resolved
@ivg
Copy link
Member

ivg commented Apr 17, 2020

Ok, it's good to go. Merge as soon as Travis greenlights!

@gitoleg gitoleg merged commit a11d6b7 into BinaryAnalysisPlatform:master Apr 17, 2020
@gitoleg gitoleg deleted the rectifies-x86-loader branch May 13, 2020 21:08
|}

let () = Extension.declare @@ fun _ctxt ->
Stream.observe Project.Info.data @@ Plt.provide;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it Project.Info.data not Project.Info.code? The .plt entries are in the code segment and are executable, so marking something in the data section as valid code seems quite wrong.

@gitoleg, any comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants