makes the disassembler more strict #1381
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In #1375 instead of terminating the program on a knowledge base conflict during instruction lifting, we decided to treat such instruction as invalid and retract it and the whole path that led to it from the set of valid instructions. It turned out that the retraction mechanism wasn't quite complete and there were certain cases when an invalid instruction was still reachable, which triggered conflicts downstream, e.g., during the CFG reconstruction.
The problem mostly arises in the interworked code, where we have to guess whether an instruction is in A32 or T32 mode using heuristics such as byte patterns, which inevitably leads to conflicts. So the first place where we have to enfore agreement is in the encoding detection. Before this change, the information provided by the knowledge base had precedence over the natural rules of encodings, i.e., that the fall or regular jump can't change the encoding, unless it is the encoding changing jump.
In addition, whenever we discover a fall or a jump to an already disassembled instruction we have to check if the encodings agree and discard it if they don't.
Finally, there were some missing cases, when the invalid code wasn't retracted. First of all, it was possible when a jump destination was invalid but the jump remained in the code set. And the dual problem, when a basic block entry point was canceled not all incoming destinations were canceled - only the path through which the block was discovered. The last two issues were fixed and they affect even those targets that do not use interworking, e.g., x86. Which is good, as more code is discarded as invalid and gives us better CFG.