-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Final 1.3 update #688
Final 1.3 update #688
Conversation
The main change is that now there is only one main state in the machine (aka project), not a separate copy per machine. This commit also fixes leave-{blk,sub} events and adds several new observations, such as exn-raised, that occurs every time a machine switches to an exceptional control flow. The commit also adds few common method to the monad interface.
Now it is possible to attach arbitrary values to a bitvector.
also fixes a bug in Eval.binop.
This bug leads to a type error, as a result of a byte load and a half word load is stored in a 32 bit variable.
a type checker didn't held correctly shifts and the normalizer didn't properly recurse
allow a backend to choose whether to propagate consts or not.
this is temporal cludge.
BIR becomes unreadable. We will apply normalization when needed. Later, we may add a memoization or stuff like that.
this enables better diagnostics of compilation and configuration errors on travis.
the plugin will mark all terms that were visited by Primus with the visited attribute.
Should do this long ago, but finally found some time. No more unnecessary parentheses! Note: operator precedences follow the C language, not OCaml. +-------------------------------------------+----+ | x[y], <cast>:<N>[x], extract:<N>:<M>[x] | 10 | +-------------------------------------------+----+ | ~x, -x | 9 | +-------------------------------------------+----+ | *,/,/$, %, %$ | 8 | +-------------------------------------------+----+ | +,- | 7 | +-------------------------------------------+----+ | <<,>>,~>> | 6 | +-------------------------------------------+----+ | <,>,<=,>= | 5 | +-------------------------------------------+----+ | =, <> | 4 | +-------------------------------------------+----+ | & | 3 | +-------------------------------------------+----+ | ^ | 2 | +-------------------------------------------+----+ | | | 1 | +-------------------------------------------+----+ | let.., x with y <- z, if.., . (dot) | 0 | +-------------------------------------------+----+ 1) Higher precedence means that operators binds tighter. 2) Any bil statement has precedence lower than an expression.
1. at some branches the simplified didn't recurce 2. if a cast casts to the same type it is removed
it was wrong.
The ssa just translates a program into the SSA form The dead-code-elimination is a conservative deadcode elimination plugin, that helps alot with x86 binaries by removing tons of unused flags calcualations.
The string_of_value function now emits the `0x` prefix for hexnumbers, unless instructed explicitly with the `prefix:false` flag. The reason for this change is to enable a consitent handling of signed and unsinged, negative and positive values.
long time ago we forgot to make it public.
1. Now it will run until a fix point is reached 2. A simple constant propagation is added, so that more virtuals are removed
there is no need to have two different names for memory.
OK, thanks for the clarifications. I now understand that BAP 1 can handle 1 thread and BAP 2 can handle 2 threads. :) |
Yeah, what you wrote there was (almost) exactly what I was suggesting. The main difference is that I'm not suggesting that you put all basic block addresses there, but instead the list you assumed it was possible to jump to when you did your dead code analysis, e.g. only the immediate successor nodes for the indirect jump. Hopefully this is usually a short list... We could even package it as a separate pass that you do before any analysis that assumes a complete CFG. This would get us two wins:
|
Yeah, @maurer, I think it would be a good idea to have this pass. The main concern is that it will by default clobber the output. We can alleviate this by rewriting the test like this:
It will be easier to analyse, and easier to eliminate clauses that are not feasible. Yeah, looks nice, we will add this cfi-or-abort pass. But let's it do later, after the release. |
Updated bap.mli with a documentation for new functions. Also, renamed Image.Scheme.reference to Image.Scheme.relocation.
Now it is enough just to type `make doc` and everything will work out of box, if a correct version of OCaml is used, i.e., 4.03. The Makefile recipe will pull the latest version of argot (git is required) and use it to build the documentation. BAP should be installed either from the source tree, or from opam, doesn't matter.
also removes some unnecessary stuff from it.
I've removed few unnecessary funciton, e.g., `State.modify` and also fixed the type of the call/cc function.
also fixes few bugs that were found as a result 1) a zero width word was created (should be a one bit width) 2) a bug in the typechecker that led to a stackoverflow So far no more type errors
also added parameters to the primus loader.
because my last update to primus broke it(((
The __primus_linker_unresolved_call function is called in case if a linker can resolved a call. Also adds the `pc` method to the interpreter interface and publish it as a primitive `get-current-program-counter`.
not actually in the random number generator, but rather in a casting it to a word.
just fail the machine.
Now we can specify a list of entry points, or a magic `all-subroutines` to start execution in parallel from all specified entry points. If no specified, then all subroutine terms marked with the `entry_point` attribute are entered.
The primus-limit plugin will terminate a machine after a certain amount of computations has happened.
This PR brings multiple new features from our development branches. In general this PR focuses on three main features:
Note for reviewers and readers: I wouldn't expect you guys to check the code thoroughly, but you are welcome to ask questions, request clarifications, and discuss changes. Most likely just reading this PR message would be enough for the review.
Note: the Table of Contents doesn't work since Github doesn't emit anchors on the PR page. They will work (I hope so), when we will move this page to the release page.
Table of Contents
Detailed Description
BIL Typechecker
We added a type checker, and now force all BIL code to be well-typed. Lifters must produce only well-typed code. So far, we will issue a warning for each piece of code that is not, along the 1.4 we are planning to fix all issues if any. The
Type
module now have theinfer
function, that will infer type for an expression, andcheck
that will type check a BIL program.BIL Effect Analysis
A precise effect analysis was added to ensure correctness of program transformations. The effect analysis distinguishes between different effects and coeffects:
The raises effect includes the division by zero CPU exception and a conservative AI will infer a possible presence of the division by zero in a BIL expression.
A new module, called
Eff
is added, that provides functions for the effect analysis.BIL Simplification and Constant Propagation
The old constant propagation and simplification algorithm was buggy, and didn't take into account effects when removing statements and subexpressions. It was rewritten from scratch with an algorithm that is sound and parametrized by the set of effects, that may be ignored during the analysis. The new analysis treats operations accurately with respect to their algebraic properties and takes into account operations associativity. It is also now a big step analysis and no longer requires to be wrapped into the
fixpoint
function. Basically, we are moving from the open and semi-open recursion to the closed recursion wherever it is possible, as it is more efficient and intuitive approach. We will probably remove thefixpoint
function in BAP.2.0.BIL normal form and normalization utilities
We introduce a notion of a BIL Normal Form, that is a sub-language of BIL that disallows certain constructions, such as
let
andif/then/else
expressions, complex memory expressions, and complex conditionals. This doesn't impose, that those expressions are not allowed in general, and lifters can still emit code that is easier to write. All such expressions are removed, i.e., are rewritten during the normalization process. The normalization procedure is described in details in the documentation, as well as in comments. Here is a brief highlight:SSA transformation plugin
This is a simple plugin that just transfers all functions into SSA form. It was moved from the bap-plugins repository.
Dead code elimination plugin
A sound analysis that will remove all unused code. This plugin is very useful, as it will remove lots of unnecessary code that are emitted by lifters. In particular, on x86 it removes flag compuations that are not used anywhere. As a result, this reduces the size of the output to 50% on x86 and makes BIL much more readable. It also highly increases the speed of interpretation. Especially, since many flag computations are quite computationally hard.
New pretty printer for bitvectors
Printers from the Z library proved to be slow, buggy, and ugly. Thus we decided to provide our own pretty-printer that will suit all the needs. A generic printer is added, that allows a user to customize the look of the textual representation of a bitvector. We also provide 9 different instantiations of the generic printer, e.g.,
pp_hex
,pp_dec
,pp_oct
andpp_bin
that will print a bitvector in the corresponding format. We also remove suffixes from a bitvector to make output more readable and concise. The full version is available viapp_hex_full
,pp_dec_full
, ... printers. We also change a default textual representation (i.e., to_string and pp) functions. The new representation now includes the signedness suffix, and moves away from usingtrue
andfalse
for the one-bit words.New functions for bitvectors
We added the
to_int[XX]_exn
family of functions, that will efficiently cast a bitvector into a corresponding OCaml type without extra allocations whenever possible. For example,Word.to_int_exn
is usually an identity function w.r.t. the GC (i.e., it still has a check that value fits, and if it is, the it is an identity).We also added the
Word.unsigned
function that will cast away the signedness of a bitvector.The
Unsafe
module was added to the bitvector interface that provides arithmetic instructions that do not check that operands have correct widths. The module can be used for interpreting a well-typed BIL code. Moreover, these operations are included by default in the Bitvector top-level interface, thus, aWord.(a + b)
will result in the unsafe(+)
operator.New pretty printer for BIL
We implemented a new pretty printer that (i) doesn't emit unnecessary parentheses anymore, and uses more concise syntax whenever it is possible. We followed the C language for the operator precedence (not OCaml). Here is the precedence table:
The load and store expressions are also simplifed, now it is not necessary to put the endiannes and size if an expression is byte sized.
Identifiable values for the Primus interpreter
We used the word type for values in the Primus interpreter, however, it showed that it is very hard to write a data tracking analysis without being able to identify values. For example, in the taint propagation plugin, we implemented a stack machine for tracking the taint. It is not only hard to maintain and understand, but also is inefficient. This defeats the main reason why we decided to use the word type as a value representation in the interpreter -- the efficiency.
We now represent a computation result as a pair of a word and a unique identifier. Basically the same as with the Bil.Result.t except that we are not using a variant type to represent storages and bottom values. In the Primus model, there are no such values, and everything is represented with a machine word, coined with an identifier.
To make it easier to work with the new values, we lifted most of the bitvector interface into the value interface.
New Primus interpreter
The Primus Interpreter is reimplemented and doesn't use the eval or microx code anymore (though there is still lots of code sharing since we factorized a lot of code from microx to the BIL library). The interpreter now tracks data much more closely, and provides an interface that allows code summaries, that are implemented in Lisp or in OCaml or in any other language to describe all code effect precisely. We also revisited observations that are made by the interpreter, and moved memory and environment observations into the interpreter.
New Primus Mark Visitied plugin
This plugin will mark each term that is visited (evaluated) by the Primus interpreter with the
visited
attribute. It is useful for tracking the coverage.Diagnostic messages with better backtraces
There is an impedance mismatch between OCaml exceptions and the error monad. Sometimes we switch from one to another and loose the backtracing information. Now the backtraces are always stored in
Or_error.t
values and are printed in case of failure.Caveats, Limitations, and Regressions
The normalized form is not represented in a type system, thus it maybe applied more than once. In future we are planning to add phantom types for that.
The dead-code elimination plugin is computationally intensive but is not cached. Thus the startup time of each binary is actually increased. We will later and the caching layer to alleviate this.
Fixes and updates for the x86 lifter
The shift operation representation was slightly incorrect, as it was using
x := x
instruction to represent that the value ofx
is unchanged. From the analysis perspective this is not entirely true, as it is an update of a variable, an unnecessary usage (that keeps DCE from removing this statement) and, in many cases, it left the right hand side left variable upper exposed, that was introducing a source of undefined values, as well as prohibited certain kind of optimizations on the interprocedural values (we were forced to treatx
as an implicit input parameter of a function, where it is upper exposed).The shift lifter also introduced unnecessary temporal variables, as well as used lots of ite. The former is removed, the latter is translated into one if statement (basically all ite were guarded by the same condition, so we can gather them, that will lead to much better CFG).
Bug fixes in ARM lifter
The ARM lifter was emitting badly typed code. Now it is fixed.
Support for tracing and other minor fixes
We added a handler for SIGINT so that gprof and other utilites, that requires a correct exit call, can work properly. We also added BAP_DEBUG=true to the Travis environment to ease the debugging on it.
TODO
Note, this is a work in progress, there are still a few minor issues that should be resolved before this branch would be ready for the merge. Here comes a list
@since
tag