-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Python: remove the imprecise container taint steps #17030
Draft
yoff
wants to merge
17
commits into
github:main
Choose a base branch
from
yoff:python/no-imprecise-container-step
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
yoff
force-pushed
the
python/no-imprecise-container-step
branch
from
July 24, 2024 15:34
34e04ff
to
07e3829
Compare
Comment on lines
+4848
to
+4850
/** | ||
* Flow summaries for string manipulation methods. | ||
*/ |
Check warning
Code scanning / CodeQL
Class QLDoc style. Warning
The QLDoc for a class should start with 'A', 'An', or 'The'.
* global (inter-procedural) taint-tracking analyses. | ||
*/ | ||
module TaintTracking { | ||
import semmle.python.dataflow.new.internal.tainttracking1.TaintTrackingParameter::Public |
Check warning
Code scanning / CodeQL
Redundant import Warning test
Redundant import, the module is already imported inside .
semmle.python.dataflow.new.internal.tainttracking1.TaintTrackingImpl
Error loading related location
Loading
yoff
force-pushed
the
python/no-imprecise-container-step
branch
from
August 7, 2024 11:50
73a33bf
to
c891057
Compare
for collections where one could read out a different element due to precise content
- fixes fully/partial SSRF confusion
Default implicit read steps changed the semantics of our taint tracking tests. This resets that semantics. We include two new annotations to allow testing with implicit reads, as well as a consistency query to prevent spurious implicit read steps.
expect empty query predicates
yoff
force-pushed
the
python/no-imprecise-container-step
branch
from
September 9, 2024 13:02
efcb918
to
243f656
Compare
Consider how to make this maintainable. Could we explicitly disallow implicit reads at specific sinks instead of rebuilding the config without them?
This is probably what we want, if we can get support for it.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We used to have taint steps from any element of a collection to the entire collection (see here).
These are very imprecise, leading to false positives (e.g. seen here and here).
They are also at odds with how other languages treat collections, see our issue about this.
We wish to keep the semantics, that if a collection is tainted, then all elements are considered tainted. Therefor we now try to not taint collections, if we have precise information about which elements are tainted.
For a list, if an element is tainted, we do not know which one, so any read is potentially reading tainted information.
There is not much difference between the list having content and the list being tainted.
But for a dictionary, if an entry is tainted and we know which one, only reads of the appropriate key is reading tainted information. All other reads should ideally be considered safe (they used to not be). If we do not know that other keys are safe, e.g. if the collection came from an untrusted source, we can taint the collection itself, and all reads will be considered dangerous. So for collections with precise content, there is a big difference between having content and the collection being tainted.
Thus we wish to remove these imprecise taint steps for tuples and dictionaries, where we track content precisely (we keep them for lists and sets, where content is imprecise anyway).
This PR now seems to demonstrate that we can achieve this, although with some caveats:
The issue with conversions is as follows: We are moving away from tainting collections when only precise content is tainted. But some operations may read any of the collection elements, for instance decoders. A call like
used to transfer taint from
tainted_obj
toencoded
, via an additional taint step. But now there is no taint to transfer becausetainted_obj
itself is not tainted. Instead, it has to make a read step. Adding read steps is not trivial, though, the best mechanism we have is that of flow summaries, but it is awkward to use here, or two reasons:We might also be able to get the converter reads as implicit reads, but there is currently no mechanism for doing that for all taint flow configurations at once.