Skip to content

Commit

Permalink
Update UIDF docs to reflect new capabilities
Browse files Browse the repository at this point in the history
  • Loading branch information
negasora committed Nov 30, 2023
1 parent 7d908fc commit 45d67ae
Showing 1 changed file with 9 additions and 11 deletions.
20 changes: 9 additions & 11 deletions docs/dev/uidf.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,17 @@ Binary Ninja now implements User-Informed DataFlow (UIDF) to improve the static

For the purpose of demonstration, we are going to use a simple [crackme](https://github.com/Vector35/uidf-example).

It is a fairly simple challenge but we can use it to demonstrate how UIDF can be used to progressively simplify the problem through branch elimination. First, we'll show how to use this new feature and then we'll discuss some of the design decisions that went into the implementation.
It is a fairly simple challenge but we can use it to demonstrate how UIDF can be used to progressively simplify the problem through branch elimination. First, we'll show how to use this new feature and then we'll discuss some of the design decisions that went into the implementation.

Let's start by looking at the `main` function. Note that the `rax_*` variables contain values returned from the computation of `check_*` functions. These values are used for determining whether control would flow to the satisfying or failure conditions of the crackme.

Let's first look at `rax_2`. We see that the conditional at Medium Level IL (MLIL) instruction 8 uses a field access into `rax` and checks its value against 0. If it evaluates to true, control-flow jumps to `0x004010a4` and outputs "Incorrect". However if it evaluates to false, we continue down the chain of conditionals which determine if the output is "Correct". To confirm our understanding and simplify the process further, we can inform the dataflow analysis engine with the value for `rax_2` and see how it impacts the control flow of the binary.

First, note the definition site of `rax_2` is instruction 7 and then right click there to see "Set User Variable Value...". Users can also bind the action to a hotkey or access it via the command palette.

**Please note that the option to set the value for a variable is only available at its definition site in the function.**

![Right click options](../img/right-click-uidf.png)

The dialog box to set the variable value contains two fields:
The dialog box to set the variable value contains two fields:

* `PossibleValueSet` type: `PossibleValueSet` is used to represent the values that a variable can take during analysis. This can also be representative of the values that the variable would take during concrete execution of the program. For instance if the variable value is `<const 0x0>`, it implies that the mapped register would always contain the value `0x0` during the lifetime of the variable. `UndeterminedValue` on the other hand suggests that the analysis could not figure out an appropriate value and representation in the available `PossibleValueSet`s. For the program analysis folks, this is analogous to the top of the dataflow lattice. An `EntryValue` represents the bottom. `PossibleValueSet`s provide complex container types such as `InSetOfValues`, `SignedRangeValue`, `UnsignedRangeValue`, etc.
* Value: Represents the concrete value which is to be informed to the analysis. Please refer to the parse_possiblevalueset [documentation](https://api.binary.ninja/binaryninja.binaryview-module.html#binaryninja.binaryview.BinaryView.parse_possiblevalueset) for a detailed description of the parser rules for the various `PossibleValueSet`s.
Expand All @@ -39,13 +37,13 @@ And of course, like everything else in Binary Ninja you can accomplish this usin
<range: 0xa to 0x14, step 0x2>
>>> PossibleValueSet.unsigned_range_value([v_r])
<unsigned ranges: [<range: 0xa to 0x14, step 0x2>]>
>>>
>>>
>>> # Creating PossibleValueSet through parsing of value string
>>> state = RegisterValueType.UnsignedRangeValue
>>> # bv.parse_possiblevalueset(value, state, here=0)
>>> bv.parse_possiblevalueset("10:20:2", state)
<unsigned ranges: [<range: 0x10 to 0x20, step 0x2>]>
>>>
>>>
```

Let's set the value of the variable `rax_2` to `ConstantValue <0x0>`. Here's how we can do it using the API:
Expand All @@ -59,7 +57,7 @@ Let's set the value of the variable `rax_2` to `ConstantValue <0x0>`. Here's how
<const 0x0>
>>> # current_function.set_user_var_value(var, definition_site, value)
>>> current_function.set_user_var_value(v, 0x40108d, value)
>>>
>>>
```

Looking at the binary this input should lead to "Incorrect" output. Setting the variable value automatically triggers a re-analysis of the function. This leads to a dataflow analysis pass over the function seeded with the user-informed value for that specific variable. On hovering over the variable token, we see the value that it takes is the one we informed.
Expand All @@ -73,7 +71,7 @@ The API makes it easy to query for the value of the conditional to check if it h
<const 0x1>
>>> current_function.get_all_user_var_values()
{<var uint64_t rax_2>: {archandaddr <x86_64 @ 0x40108d>: <const 0x0>}}
>>>
>>>
```

Let us see the High Level IL (HLIL) representation of `main` before UIDF:
Expand All @@ -84,14 +82,14 @@ Below is the HLIL of `main` after UIDF. If constant propagation leads to conditi

![HLIL after UIDF](../img/hlil-after-uidf.png)

We believe that UIDF is great for experimenting around with during the initial phase of the RE life cycle. If the value for a variable is already informed, right-clicking on the variable at its definition site will show an option to "Clear User Variable Value". This clears the informed value and triggers a reanalysis. And as always, you can just use undo if setting the user variable value was the last action (using CMD/CTRL+Z).
We believe that UIDF is great for experimenting around with during the initial phase of the RE life cycle. If the value for a variable at a location is already informed, right-clicking on the variable at any usage site where the variable has the same SSA version will show an option to "Clear User Variable Value". This clears the informed value and triggers reanalysis. And as always, you can just use undo if setting the user variable value was the last action (using CMD/CTRL+Z).

```python
>>> # current_function.clear_user_var_value(var, definition_site)
>>> current_function.clear_user_var_value(v, 0x40108d)
>>> # Clear all informed user variable values
>>> current_function.clear_all_user_var_values()
>>>
>>>
```

![Clear Variable Value](../img/clear-var-value.png)
Expand Down Expand Up @@ -123,4 +121,4 @@ Static-only reverse engineering limits the ability to test hypothesis formulated

UIDF primarily operates on the MLIL layer. Binary Ninja performs constant propagation, resolves call parameters and lifts stack load/stores into a variables. This combined with the SSA form enables an effective dataflow analysis pipeline. MLIL also provides control over dead-code elimination and allows us to prevent removal of redundant instructions (eg. variable definitions for informed variables) and basic blocks (removed due to the resulting dataflow). Binary Ninja's tiered IL representation allows the higher layers to benefit from simplifying transformations at the lower layers. Through fixing jump tables or branch elimination, UIDF influences HLIL output as a user would expect without having drastic changes to the MLIL representation.

Looking towards the future, we agree that UIDF would be even more usable if available at the HLIL layer instead. Our users are more likely to open HLIL first when looking at a binary due to its C-like code structure and the lack of dead-code. Also, improvements to the dataflow solver engine would inherently increase the efficacy of UIDF. This could include support for more operations on the complex [`PossibleValueSet`](https://api.binary.ninja/binaryninja.function.PossibleValueSet.html) containers or the ability to play well with program structure such as loops.
Looking towards the future, improvements to the dataflow solver engine would inherently increase the efficacy of UIDF. This could include support for more operations on the complex [`PossibleValueSet`](https://api.binary.ninja/binaryninja.function.PossibleValueSet.html) containers or the ability to play well with program structure such as loops.

0 comments on commit 45d67ae

Please sign in to comment.