const-eval: detect more pointers as definitely not-null #133700

RalfJung · 2024-12-01T12:07:30Z

This fixes #133523 by making the scalar_may_be_null check smarter: for instance, an odd offset in any 2-aligned allocation can never be null, even if it is out-of-bounds.

More generally, if an allocation with unknown base address B is aligned to alignment N, and a pointer is at offset X inside that allocation, then we know that (B + X) mod N = B mod N + X mod N = X mod N. Since 0 mod N is definitely 0, if we learn that X mod N is not 0 we can deduce that B + X is not 0.

This is immediately visible on stable, via ptr.is_null() (and, more subtly, by not raising a UB error when such a pointer is used somewhere that a non-null pointer is required). Therefore nominating for @rust-lang/lang.

rustbot · 2024-12-01T12:07:38Z

r? @davidtwco

rustbot has assigned @davidtwco.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2024-12-01T12:07:40Z

Some changes occurred to the CTFE machinery

cc @rust-lang/wg-const-eval

Some changes occurred to the CTFE / Miri interpreter

cc @rust-lang/miri

RalfJung · 2024-12-02T08:54:31Z

r? @lcnr

lcnr · 2024-12-02T09:35:59Z

r=me after lang approval (idk if it needs a full FCP, it is observable by users after all)

tmandry · 2024-12-04T17:06:34Z

@RalfJung @JakobDegen Could you elaborate on the motivation? I agree it would be nice if #133523 compiled but I find myself asking "how clever is clever enough" for these checks. Do you feel comfortable with writing this behavior into the language spec?

nikomatsakis · 2024-12-04T17:07:14Z

Oh dear. @rfcbot cancel

@labels -T-compiler

nikomatsakis · 2024-12-04T17:07:23Z

what am I doing :)

nikomatsakis · 2024-12-04T17:07:35Z

@rustbot labels -T-compiler

@rfcbot fcp cancel

rfcbot · 2024-12-04T17:07:59Z

@nikomatsakis proposal cancelled.

nikomatsakis · 2024-12-04T17:08:02Z

@rfcbot fcp merge

Based on discussion in the lang-team meeting we felt this needed an FCP. We discussed a few points we'd like to see clarified

overall this doesn't seem to be an undue complication on the model itself -- i.e., it's not adding new information into the abstracted form of values that CTFE thinks about, only making better use of the data it already has

but it is still complicating the spec, and it's not obvious when this function (or any other) will be "smart enough", so @tmandry was looking for better motivation than an issue (does this represent a real-world pattern?). The other question came from @pnkfelix who was wondering if the logic could be invalidated by people casting unaligned pointers or doing other things that don't respect alignment.

rfcbot · 2024-12-04T17:08:49Z

Team member @nikomatsakis has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

cc @rust-lang/lang-advisors: FCP proposed for lang, please feel free to register concerns.
See this document for info about what commands tagged team members can give me.

nikomatsakis · 2024-12-04T17:20:52Z

@rustbot labels +T-compiler

RalfJung · 2024-12-04T17:29:47Z

Yeah this does complicate the spec, but not unduly so I would say. As you say, the change is entirely local, only making better use of information we already have. So it seemed like an easy win to me. We can of course also wait until someone shows up with a compelling real-world usecase. (@JakobDegen maybe you already have one?)

If/when we ever get more clever niches for references (i.e., based on their alignment), we'll need to add CTFE logic of this sort (probably more complicated) to ensure that CTFE can still determine the active enum variant.

wondering if the logic could be invalidated by people casting unaligned pointers or doing other things that don't respect alignment.

I don't see how. All we do is look at the pointer value. We don't "trust" how the value got computed or anything. Given a pointer value of offset X into an allocation with alignment A, we know that the absolute address will be B+X for some B divisible by A, and that's all we are making use of here. This is fundamentally the pointer model of CTFE, and there is no way for code to "bypass" it.

The one thing we do trust is that CTFE allocations (turning into LLVM globals) truly end up with the alignment that they are declared with. But we already rely on that anyway (and we have some open soundness issues because some platforms don't always get this right for alignments on the order of a page or larger).

JakobDegen · 2024-12-04T18:31:52Z

So I do think the issue is representative of a somewhat plausible real world pattern: Storing a one-past-the-end-pointer which additionally packs extra data into the low bits, in a const context. Wanting to do all three of those at once is probably somewhat rare, but none of them are unusual on their own.

With regards to this complicating the spec, I think it'd be good to write down what this check even really is (if someone has a different mental model, please share). It looks to me like it stems from the observation that the following code:

let x = 0_u16;
NonNull::new_unchecked((&raw x).wrapping_byte_add(4))

Is unconditionally UB, both at runtime and at const time (albeit only with suitable non-deterministic choices). This check appears to me to be an attempt to shift detection of that programming error left, and as a result is somewhat similar in spirit to alignment checks on raw pointer derefs in debug mode. To me, this indicates that our basic operating principle here should be that to the extent that this check is imprecise in detecting UB, it should have false negatives, not false positives. A couple consequences/reasons for that:

Surprising users when we reject what they correctly think of as valid code is worse than sometimes failing to emit helpful diagnostics when there's UB. Consider also the parallel with alignment checks in debug modes; clearly we would not be ok with those having false positives either.
This PR is not actually a complication of the spec, primarily because if we restrict ourselves to false negatives, the check now doesn't need to appear in the spec at all - it's only a diagnostic improvement in those cases in which there was UB anyway
- This was not true before, the previous version of the check does fire on cases that are otherwise defined behavior.
Restricting ourselves to false negatives also means that there's little risk associated with this accidentally changing behavior at some point - if some changes causes us to detect slightly more/less UB, maybe rustc is a bit better/worse, but it's not breaking.

Regardless, I don't actually think the new version of the check is imprecise in either direction in detecting the UB it intends to catch, and I agree with Ralf that I don't see a reason we can't maintain that property going forward.

traviscross · 2024-12-04T20:11:49Z

What @RalfJung and @JakobDegen said makes sense to me.

@rfcbot reviewed

RalfJung · 2024-12-04T21:30:30Z

So I do think the issue is representative of a somewhat plausible real world pattern: Storing a one-past-the-end-pointer which additionally packs extra data into the low bits, in a const context. Wanting to do all three of those at once is probably somewhat rare, but none of them are unusual on their own.

That's a good example, thanks.

It looks to me like it stems from the observation that the following code:

I can't follow this part of your post. I don't see what this proposal has to do with your example, nor with the arguments that follow. (I wouldn't say it is unconditional UB, it is UB if the address happens to be -4isize as usize. We can now do reasoning with non-determinism to say that the compiler can make it so that the address is that value, but at that point I am not sure how that's still about this PR.)

This is about the function that CTFE uses to determine whether a pointer may be null. This function is used in multiple situations:

ptr.is_null(). If the pointer is definitely not null, we return false. If we don't know, we abort execution.
to determine whether we raise UB when a reference has that pointer value. If the pointer may be null, we report UB.
when reading the discriminant of Option<&T>: if we see a pointer, and we are sure it cannot be null, we return Some. If it may be null, we error.

This is a "may be null" since we don't know the absolute address of the pointer, so we can only do an approximation based on incomplete information. This PR makes the logic determining that approximation a bit smarter. This is basically a standard symbolic evaluation / abstract interpretation situation: we have partial knowledge about the address the pointer will have, and have to determine whether "null" is in the set of possible values.

JakobDegen · 2024-12-04T23:52:37Z

This function is used in multiple situations:

Ah, sorry, I hadn't looked at the code change in a great amount of detail so I missed this. Hopefully what I said makes more sense if we imagined this was only about const validation or other removable UB checks - given that this is also about .is_null() you're obviously right that this is user-facing. In any case though, I agree with your argument and think that ought to be strong enough on its own.

RalfJung · 2024-12-05T06:40:03Z

In terms of just the UB checks, one could argue that we are overeager reporting UB when a pointer might be null or might be misaligned. But that's pre-existing before this PR.

In your example, raising UB seems justified if we go with the "compiler resolves non-determinism" interpretation, agreed. I am not sure if UB is justified in all cases that "may be null" returns true, as that was not the goal.

The "panic in const if CTFE doesn't know the answer" behavior was discussed to be the desired behavior in rust-lang#74939, and is currently how the function actually behaves. I intentionally wrote this documentation to allow for the possibility that a panic might not occur even if the pointer is out of bounds, because of rust-lang#133700 and other potential changes in the future.

RalfJung · 2024-12-20T10:02:59Z

@tmandry @pnkfelix we did our best to answer your questions above; please let me know if there are still any remaining concerns or if FCP can proceed here. :)

RalfJung · 2024-12-20T10:03:42Z

Oh wait Felix isn't on the team any more oops. I'll tick their box, then.

The "panic in const if CTFE doesn't know the answer" behavior was discussed to be the desired behavior in rust-lang#74939, and is currently how the function actually behaves. I intentionally wrote this documentation to allow for the possibility that a panic might not occur even if the pointer is out of bounds, because of rust-lang#133700 and other potential changes in the future.

Correctly document CTFE behavior of is_null and methods that call is_null. The "panic in const if CTFE doesn't know the answer" behavior was discussed to be the desired behavior in rust-lang#74939, and is currently how the function actually behaves. I intentionally wrote this documentation to allow for the possibility that a panic might not occur even if the pointer is out of bounds, because of rust-lang#133700 and other potential changes in the future. This is beta-nominated since `const fn is_null` stabilization is in beta already but the docs there are wrong, and it seems better to have the docs be correct at the time of stabilization.

Rollup merge of rust-lang#134325 - theemathas:is_null-docs, r=RalfJung Correctly document CTFE behavior of is_null and methods that call is_null. The "panic in const if CTFE doesn't know the answer" behavior was discussed to be the desired behavior in rust-lang#74939, and is currently how the function actually behaves. I intentionally wrote this documentation to allow for the possibility that a panic might not occur even if the pointer is out of bounds, because of rust-lang#133700 and other potential changes in the future. This is beta-nominated since `const fn is_null` stabilization is in beta already but the docs there are wrong, and it seems better to have the docs be correct at the time of stabilization.

rustbot assigned davidtwco Dec 1, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Dec 1, 2024

RalfJung mentioned this pull request Dec 1, 2024

False positive const-UB check on NonNull pointer #133523

Open

RalfJung added T-lang Relevant to the language team, which will review and decide on the PR/issue. I-lang-nominated Nominated for discussion during a lang team meeting. labels Dec 1, 2024

lcnr approved these changes Dec 2, 2024

View reviewed changes

rustbot assigned lcnr and unassigned davidtwco Dec 2, 2024

This comment was marked as outdated.

Sign in to view

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Dec 4, 2024

rustbot removed the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 4, 2024

rfcbot removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Dec 4, 2024

rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Dec 4, 2024

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 4, 2024

const-eval: detect more pointers as definitely not-null

b438e46

RalfJung force-pushed the const-non-null branch from 4df3570 to b438e46 Compare December 5, 2024 21:22

theemathas mentioned this pull request Dec 15, 2024

Correctly document CTFE behavior of is_null and methods that call is_null. #134325

Merged

RalfJung added the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label Dec 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

const-eval: detect more pointers as definitely not-null #133700

const-eval: detect more pointers as definitely not-null #133700

RalfJung commented Dec 1, 2024

rustbot commented Dec 1, 2024

rustbot commented Dec 1, 2024

RalfJung commented Dec 2, 2024

lcnr commented Dec 2, 2024

tmandry commented Dec 4, 2024

This comment was marked as outdated.

This comment was marked as outdated.

nikomatsakis commented Dec 4, 2024

nikomatsakis commented Dec 4, 2024

nikomatsakis commented Dec 4, 2024

rfcbot commented Dec 4, 2024

nikomatsakis commented Dec 4, 2024

rfcbot commented Dec 4, 2024 •

edited by RalfJung

Loading

nikomatsakis commented Dec 4, 2024

RalfJung commented Dec 4, 2024

JakobDegen commented Dec 4, 2024 •

edited

Loading

traviscross commented Dec 4, 2024

RalfJung commented Dec 4, 2024

JakobDegen commented Dec 4, 2024

RalfJung commented Dec 5, 2024 •

edited

Loading

RalfJung commented Dec 20, 2024 •

edited

Loading

RalfJung commented Dec 20, 2024

const-eval: detect more pointers as definitely not-null #133700

Are you sure you want to change the base?

const-eval: detect more pointers as definitely not-null #133700

Conversation

RalfJung commented Dec 1, 2024

rustbot commented Dec 1, 2024

rustbot commented Dec 1, 2024

RalfJung commented Dec 2, 2024

lcnr commented Dec 2, 2024

tmandry commented Dec 4, 2024

This comment was marked as outdated.

This comment was marked as outdated.

nikomatsakis commented Dec 4, 2024

nikomatsakis commented Dec 4, 2024

nikomatsakis commented Dec 4, 2024

rfcbot commented Dec 4, 2024

nikomatsakis commented Dec 4, 2024

rfcbot commented Dec 4, 2024 • edited by RalfJung Loading

nikomatsakis commented Dec 4, 2024

RalfJung commented Dec 4, 2024

JakobDegen commented Dec 4, 2024 • edited Loading

traviscross commented Dec 4, 2024

RalfJung commented Dec 4, 2024

JakobDegen commented Dec 4, 2024

RalfJung commented Dec 5, 2024 • edited Loading

RalfJung commented Dec 20, 2024 • edited Loading

RalfJung commented Dec 20, 2024

rfcbot commented Dec 4, 2024 •

edited by RalfJung

Loading

JakobDegen commented Dec 4, 2024 •

edited

Loading

RalfJung commented Dec 5, 2024 •

edited

Loading

RalfJung commented Dec 20, 2024 •

edited

Loading