Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use CLOCK_BOOTTIME for Instant in Fuchsia/Android #132331

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

mathukumillia
Copy link

Fuchsia and Android both want Instants to progress during periods of suspension, and thus must use CLOCK_BOOTTIME as the backing reference clock.

r? @tmandry

Fuchsia and Android both want Instants to progress during periods of
suspension, and thus must use CLOCK_BOOTTIME as the backing reference
clock.
@rustbot
Copy link
Collaborator

rustbot commented Oct 29, 2024

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @tmandry (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 29, 2024
@tmandry
Copy link
Member

tmandry commented Oct 29, 2024

cc @maurer

@rustbot

This comment was marked as off-topic.

@maurer
Copy link
Contributor

maurer commented Oct 29, 2024

Don't have official review powers, but LGTM as the Android maintainer.

@tmandry
Copy link
Member

tmandry commented Oct 29, 2024

Could you also update this table in the docs?

/// | Platform | System call |
/// |-----------|----------------------------------------------------------------------|
/// | SGX | [`insecure_time` usercall]. More information on [timekeeping in SGX] |
/// | UNIX | [clock_gettime (Realtime Clock)] |
/// | Darwin | [clock_gettime (Realtime Clock)] |
/// | VXWorks | [clock_gettime (Realtime Clock)] |
/// | SOLID | `SOLID_RTC_ReadTime` |
/// | WASI | [__wasi_clock_time_get (Realtime Clock)] |
/// | Windows | [GetSystemTimePreciseAsFileTime] / [GetSystemTimeAsFileTime] |

Copy link
Member

@tmandry tmandry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once docs are updated.

@tmandry
Copy link
Member

tmandry commented Oct 29, 2024

r? rust-lang/libs - Both platform owners approve this change, but I would like your team to be in the loop also. This change is allowed by my reading of the Instant docs.

CLOCK_BOOTTIME always progresses monotonically; the only difference is that it progresses during suspend, unlike CLOCK_MONOTONIC. The platform owners have determined that this is a better default for applications on their platform, especially for use cases that interact with outside services. Android has long been patching their platform toolchain to do this. This change allows us to converge on one upstream implementation.

@rustbot rustbot assigned jhpratt and unassigned tmandry Oct 29, 2024
@rust-lang rust-lang deleted a comment from rustbot Oct 29, 2024
@the8472
Copy link
Member

the8472 commented Oct 30, 2024

Fuchsia and Android both want Instants to progress during periods of suspension,

They cannot rely on this because the documentation explicitly does not guarantee for this behavior:

As part of this non-guarantee it is also not specified whether system suspends count as elapsed time or not. The behavior varies across platforms and Rust versions.

Until #87906 is settled we shouldn't introduce platform-specific behavior that makes code non-portable.

Also who is this "android wants" person that attempts to define how the Rust standard library should behave? And where did that discussion happen?
Edit: Ah I guess that was also #87906, though android has been mentioned only once deep in that thread as a code-reference.

@maurer
Copy link
Contributor

maurer commented Oct 30, 2024

As a brief summary of the situation resulting from the other thread:

  1. Android currently has a patch carried against the Rust compiler to make the above true (though our carried patch is broader, and does it for all unixy OSes).
  2. Even without being able to rely on it, ecosystem crates assume that Instant ticks in suspend. We have a library we use in first party code to make sure time reliably works correctly, but as long as ecosystem crates assume Instant ticks in suspend, we need to keep this patched.

@the8472
Copy link
Member

the8472 commented Oct 30, 2024

Even without being able to rely on it, ecosystem crates assume that Instant ticks in suspend.

Then, at the moment, the crates have a bug. It would be great if we could offer a portable solution here, but currently we can't because not all OSes provide such a clock.
Shipping a custom toolchain for a specific target just makes even more crates fall for that trap because it works on the platform they mainly care about, essentially relying on implementation details.

So I don't see this as a path to actually solving the problem, possibly even making it worse.

The options I see are

  • fixing those crates
  • implementing a clock source API
  • poking POSIX to standardize such a clock and getting it into the relevant OSes, but I'm skeptical that this is workable, embedded platforms might not have anything to measure elapsed time during sleep
  • giving up on portability and saying things like "tier 1 platforms must have these properties, others might not" or something along those lines

Fuchsia and Android both want Instants to progress during periods of
suspension, and thus must use CLOCK_BOOTTIME as the backing reference
clock.
@mathukumillia
Copy link
Author

I've updated the docs to reflect the usage of boot time in Android and Fuchsia.

Until #87906 is settled we shouldn't introduce platform-specific behavior that makes code non-portable.

As @maurer has pointed out, the Instant library already has platform specific behavior - it behaves differently on Windows vs Linux. I do not see how having it behave differently on Fuchsia and Android is any different.

@the8472
Copy link
Member

the8472 commented Oct 30, 2024

Instant library already has platform specific behavior - it behaves differently on Windows vs Linux.

The current situation does not exist to enable specific crates to keep their broken implementations forever. But this PR would. Enabling crates in their reliance on implementation details on one platform makes this into a problem for other platforms. In other words you would be amplifying hyrum, and you're shifting the burden to other OSes.
This is... myopic, bad incentives/policy/game theory or however you want to call it.

It goes roughly like this:

  • someone writes broken code that doesn't work with suspend
  • someone else encounters this code
  • "wouldn't it be nice if MY platform used a boottime clock instead?"
  • "here's a PR that only fixes it on MY platform,"
  • repeated by different people for all platforms that have a boottime clock
  • we are now left with a majority of platforms on one clock source and a minority on another

std aims to provide portable abstractions. That goal is not served by an incremental, myopic steps. The minority platforms are left out in the cold with subtly broken code. std can still not make any additional guarantees about universal behavior. People ignore the docs even more than before and code to observed behavior instead because it works "almost everywhere" anyway. That's an undesirable outcome.

Instead the fact that such clocks aren't available on all platforms should be made explicit and the crates then will have to think about fallbacks or other ways to guard against whatever undesirable behavior we're talking about here in the abstract. Or at least they could make the error upfront and fail loudly when an essential building block is missing.

Additionally this ignores the issue that there also are uses where not counting suspend can also be useful, so it just trades one benefit for another rather than enabling both.


Yes the current situation isn't great. And you're feeling pain due to it. But imo that pain should be incentive to either fix the broken crates or help with a proper solution in std. Not papering over bugs for just one platform.

@the8472
Copy link
Member

the8472 commented Oct 30, 2024

Also, just because which clock is used on particular platforms is documented does not mean you're allowed to rely on it. It says

 The following system calls are [currently](crate::io#platform-specific-behavior) being used by `now()`

so even if this PR were accepted today, someone could come along tomorrow and change it back.

@CryZe
Copy link
Contributor

CryZe commented Nov 3, 2024

Is this even correct for Fuchsia in the first place? CLOCK_BOOTTIME is the same thing as CLOCK_MONOTONIC on Fuchsia: https://cs.opensource.google/fuchsia/fuchsia/+/main:zircon/third_party/ulib/musl/src/time/clock_gettime.c;l=40-41

@mathukumillia
Copy link
Author

I have a patch to fix libc to use the right syscall in Fuchsia. Fuchsia only recently implemented CLOCK_BOOTTIME, and we're still working through updating everything that needs it to use it.

@tmandry
Copy link
Member

tmandry commented Nov 5, 2024

The justification for this PR is in #88714 (comment):

It seems best for std's Instant to just use whatever is the 'standard' monotonic clock on each platform.

The owners of a platform get to define what that standard monotonic clock is for their platform. In this case, both Fuchsia and Android want the standard monotonic clock to be the boottime clock, and this PR reflects that.


std aims to provide portable abstractions. That goal is not served by an incremental, myopic steps. The minority platforms are left out in the cold with subtly broken code.

Which minority platforms are you worrying about here? This is two minority platforms deciding that they'd rather diverge from the majority (Linux and macOS) because they think it will work better in the majority of cases for the application classes that their platforms are designed for.

Yes we want portable code, but in the absence of a well-defined semantics the precise details of what that code does may be impacted by the platform it is running on, which is relevant insofar as it informs us what kind of behavior is expected for applications running on that platform.

There is precedent on either side; Windows being the main example of using boot time. It seems like you are arguing against the past decision to make the behavior of this API platform-specific in the first place. Arguments about past decisions should not block PRs like this.

Additionally this ignores the issue that there also are uses where not counting suspend can also be useful, so it just trades one benefit for another rather than enabling both.

Yes, and it would be great if std had two APIs to enable both cases, but it does not. In the absence of a perfect solution we are forced to make tradeoffs. The particular tradeoff here is what should most code on this platform do most of the time.

Yes the current situation isn't great. And you're feeling pain due to it. But imo that pain should be incentive to either fix the broken crates or help with a proper solution in std. Not papering over bugs for just one platform.

We'll have to disagree here – platform maintainers are not fungible assets who can take on a much larger project of defining new std APIs and migrating the ecosystem to it. That's orders of magnitude more work that requires a different set of skills and social connections, and it's letting the perfect be the enemy of the good.

Pushing back on incremental improvements like this does no one any good, including the Rust project. If someone is interested in contributing to Rust in a small way, even in a way that's initially scoped to their platform, let them. That's how we will grow contributors with more impact over time.

@erickt
Copy link
Contributor

erickt commented Dec 12, 2024

Would it simplify things if we refactored this PR to just focus on Fuchsia, and a separate one for Android? Right now all the known rust code that's targetting Fuchsia is owned by the Fuchsia team, whereas Android may have users that could be impacted by this change.

@the8472
Copy link
Member

the8472 commented Dec 12, 2024

Yes we want portable code, but in the absence of a well-defined semantics the precise details of what that code does may be impacted by the platform it is running on, which is relevant insofar as it informs us what kind of behavior is expected for applications running on that platform.

That sounds like you intend to rely on implementation details. Is this correct?

There is precedent on either side; Windows being the main example of using boot time. It seems like you are arguing against the past decision to make the behavior of this API platform-specific in the first place. Arguments about past decisions should not block PRs like this.

No. The I'm arguing that guaranteed behavior is not provided here. When it comes to suspend it is unspecified, NOT specified-per-platform. The documentation is very explicit about this (even though it does not have to be, anything not explicitly specified is, well, unspecified):

As part of this non-guarantee it is also not specified whether system suspends count as elapsed time or not. The behavior varies across platforms and Rust versions.

Which means we could flip a coin at each program startup to choose between CLOCK_BOOTTIME and CLOCK_MONOTONIC on any platform that has both.

Therefore any program that relies on this property has a potential correctness problem on future Rust versions and a portability problem. Changing the behavior to aid crates that want to rely on non-guaranteed behavior is encouraging the wrong thing.

Pushing back on incremental improvements like this does no one any good, including the Rust project.

I don't see how it's an improvement. If we just let platform-maintainers do uncoordinated decisions, e.g. if platform A says "we think accounting for suspend should be the default" and platform B says "suspend is not time spent working on anything, therefore shouldn't be included", then Instant is forced to guarantee neither.

To get to an endpoint where Instant can provide a guarantee that suspend is either included or excluded everyone must move in the same direction.
If that is not possible then we need to explore alternatives.

But this PR doesn't move in any of those directions. As far as I can tell it's aiming to encourage or enable crate authors to ignore the API contract.


Would it simplify things if we refactored this PR to just focus on Fuchsia, and a separate one for Android? Right now all the known rust code that's targetting Fuchsia is owned by the Fuchsia team

Fuchsia is Tier 2, the target tier policy says that it must not be exclusively useful to a closed group. So we should assume external users do exist.

@the8472
Copy link
Member

the8472 commented Dec 12, 2024

My interpretation of

The justification for this PR is in #88714 (comment):

It seems best for std's Instant to just use whatever is the 'standard' monotonic clock on each platform.

is that was referring to the OS developers changing whatever their main/preferred clock API used for timing, waiting, sleeping etc. and Instant following those changes as we upgrade to new major OS versions.

I.e. I understand it as an argument for implementation flexibility, enabled by the current absence of guarantees. Not as an argument for someone wanting to make Hyrum more powerful (more guarantees, less implementation flexibility).

So if you want to guarantee more behavior then I think you have to request an API change from the libs-API team.

@mathukumillia
Copy link
Author

Which means we could flip a coin at each program startup to choose between CLOCK_BOOTTIME and CLOCK_MONOTONIC on > any platform that has both.

Therefore any program that relies on this property has a potential correctness problem on future Rust versions and a portability > problem. Changing the behavior to aid crates that want to rely on non-guaranteed behavior is encouraging the wrong thing.

This feels rather idealistic. I guarantee you that if you were to actually implement that change to flip a coin on startup and randomly choose a clock, a lot of rust programs in the wild would break. Calling every single one of those programs incorrect is, is in my opinion, an extremely impractical position. Telling contributors that they must modify/fix the standard library whenever they want to make existing software that uses those libraries work on their platform is pretty frustrating.

In any case, I think this argumentation is kind of getting us nowhere. How am I expected to make progress here? I have a very simple problem to solve: a large amount of code on Fuchsia that uses Instant expects it to progress during periods of suspension. I have four paths:

  1. Make Instant use boot time, as I'm doing here.
  2. Keep a local patch to the std lib to use boot time like Android does.
  3. Modify the standard library API to make Instant's behavior during suspend more explicit. How does one even start doing something like this?
  4. Implement a local wrapper around Instant that has the semantics I want. This works in Fuchsia code, but does not work well when using third party crates that use upstream instant. The inconsistencies in those Instants may cause problems.

How would the rust library maintainers like us to proceed? (side note: who makes these decisions for the rust community?)

@the8472
Copy link
Member

the8472 commented Dec 13, 2024

a large amount of code on Fuchsia that uses Instant expects it to progress during periods of suspension. I have four paths:

  1. fix those crates

Make Instant use boot time, as I'm doing here.

This option is worse than hyrum's law. Currently this is not even an implementation detail. You want to make this an implementation detail and then turn around and write code that relies on that. So you want to create de-facto API guarantees without using those words.
I fear a few months later someone will then come and say "look at all this code we have written that relies on this detail [which we put ourselves into std], now it should be guaranteed", effectively backdooring the API design process.

I hope you see how shady this looks.

API design does not happen by the process of writing crates that want something, then changing the std implementation and then presenting us with a fait accompli.


How would the rust library maintainers like us to proceed? (side note: who makes these decisions for the rust community?)

Usually platform maintainers are not the same people writing most of the downstream crates for a platform, so they're not in a position where they want to change an std implementation to suit their own crates' needs.

If your code needs a novel API guarantee then that is a feature request, one that will require discussion and design work. It's the libs-api team that approves API changes. See https://std-dev-guide.rust-lang.org/development/feature-lifecycle.html

Alternatively you can ask the libs-team if this could be considered a bug-fix or quality-of-implementation improvement, though I must say I would oppose that for the reasons I explained above.


If you're saying there are lots of programs that are broken unless you use a custom std which changes behavior that sounds like a rather dire situation and I feel that this has not been accurately reflected in previous discussion.

It would be helpful to have more context to understand the issue (perhaps not in this PR though). The kind and extent of breakage are we talking about, what are some big offending crates, how are they used, etc.

@the8472 the8472 added the I-libs-nominated Nominated for discussion during a libs team meeting. label Dec 13, 2024
@Amanieu
Copy link
Member

Amanieu commented Dec 17, 2024

We discussed this in the libs-api meeting today. One concern that was raised is that the sleep/timeout APIs all use CLOCK_MONOTONIC internally and this change would make Instant inconsistent with them.

On a more general note, there isn't really a suitable global policy for whether to include time suspended or not: different call site may want different behavior depending on what is being measured.

As such we want to ask: would it be acceptable to leave Instant on CLOCK_MONOTONIC since it is the system default, but instead provide a clock source API which allows users to explicitly select whether to include suspend time? Here is a quick draft of what it could look like:

struct Instant<C: Clock = DefaultClock>;

struct DefaultClock; // Clock used for OS sleep/timeout
struct ContinuousClock; // Clock which counts time in sleep
struct SuspendingClock; // Clock which doesn't count time in sleep

trait Clock: Sealed {
    fn now() -> Instant<Self>;
}

@erickt
Copy link
Contributor

erickt commented Dec 17, 2024

@Amanieu that's along the lines of what @mathukumillia and I were playing around with offline. I've started an experimental port of the standard library over to parameterize the Clock to see how it feels on a large codebase. I'm pretty sure we'd also need to parameterize Duration as well on the clock, since we'd have flow the duration from std::thread::sleep(Duration<Clock>) down to an API call like clock_nanosleep on Linux.

We also need to figure out how an external library like tokio would map these types to the various syscalls. In tokio-rs/tokio#3185 (comment), on Linux only some of the networking APIs support different clocks.

@the8472 the8472 removed the I-libs-nominated Nominated for discussion during a libs team meeting. label Dec 17, 2024
@Amanieu
Copy link
Member

Amanieu commented Dec 17, 2024

We specifically talked about Duration and the conclusion was not to parameterize it. Fundamentally, a Duration represents a time period (in seconds) which is independent of the clock on which it is measured. Different clocks may advance through time at different rates, but they all share the same unit of time (seconds).

All timeouts/sleeps are only defined against DefaultClock, unless you specifically opt-in to a different clock. This opt-in should be in the form of separate functions that take an additional Clock argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-unix Operating system: Unix-like S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants