-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
time: time.Sleep and time.NewTimer: Fix duration parameter error #17696
Comments
Not sure this is a documentation issue. On darwin and linux, we sleep more than duration (at least, i cannot reproduce the other case). The Windows behavior looks broken to me. |
Interesting. Were you able to run the test program? I'd be curious to see your results. If it is just a WIndows problem then perhaps adding 1 to whatever d is, as long as it isn't 0, might be all it takes to solve it. |
What I see on darwin:
|
/cc @alexbrainman |
Wow! I was thinking that perhaps the Linux time source would be good to 1us. 48ns! This is what I see for the verifications on my Windows machine.
I see what you mean about the resulting durations being a bit (about 25%) too long. That makes no sense at all to me. It makes me wonder if there is a math error somewhere (using 4 instead of 5, or vice versa). Perhaps it is worth creating an issue for this. 25% is too much to ignore. |
@johnrs, what output do you get from your test program on Windows? And which version of Windows are you running? I tried it on our Windows testing systems and I get:
A 976.5µs interval is 1024 Hz. In order to sleep 1ms, we wait for two ticks to elapse (one is not enough), giving 1.953ms. I'm curious what resolution you are seeing. The documentation is a correct statement of the semantics we intend to guarantee. The implementation is that on entry to time.Sleep, the runtime calls runtime.nanotime, adds the duration, records that sum, and puts the goroutine into a list of pending timers. At some point in the future, the timer-handling goroutine calls runtime.nanotime, sees that the current time is now at or beyond the time at which the original goroutine should be woken up, and wakes it up. There are always going to be rounding errors possible depending on the actual precision of the underlying system clock. I'm not sure what there is to do about that. In general we probably don't want to make all sleeps take longer than they should. FWIW, on Windows runtime.nanotime is using "Interrupt Time", which is at least recorded in 100ns units. It may only be updated at 1024 Hz. Similarly, time.Now uses "System Time", which again is 100ns precision but maybe only 1024 Hz accuracy. |
When I run the test on my system this is what I get:
I'm running Windows 7. I believe that most Windows systems are 1.001ms (minimum) and 15.600ms (default). This is what I've seen on various versions and systems over the years. It's also what I see most people refer to when discussing it. The ClockRes program, part of the SysInternals suite (www.sysinternals.com), shows this "system clock resolution" when I'm running a Go program:
So it seems that our systems differ in 2 ways: 976.5us vs 1.0001ms (no big deal), and two ticks vs 1 tick for the Sleep and After timing. Strange! Using 1 extra clock tick, as you describe, sounds like a good idea to me. At first I thought this might be a tacky solution, but upon further consideration I like it. It makes is "safe" (no 20 times too short intervals) and that's the most important thing. But that's not what seems to be taking place on my system. btw - If I run the test under Go 1.6.3 I see the same results, but based on 15.6ms instead of 1ms. I first noticed this problem under Go 1.5.2 and posted a note about it in go-nuts on 2015/12/24, so it's nothing new. I just forgot to follow up on it then. |
I have nothing to add to @rsc explanation. My Windows 7 outputs:
You are correct that Windows 7 have 2 timer resolutions: 1ms or 16ms. When system starts, it defaults to 16ms. But once any program calls timeBeginPeriod(1), the wide system timer resolution goes to 1ms, and stays with 1ms until the program exits. Initially every Go program called timeBeginPeriod(1) at the start. But then ab4c929 stopped that. And then we discovered some problems with our change, and 0435e88 started calling timeBeginPeriod(1) again. Alex |
@johnrs you could also debug what is happening on your computer. As far as I remember most code will be in runtime\time.go. Alex |
I did a safe-boot, just to see if any software on my system might be affecting it. I got the same result. I also tried a 32-bit version. Still the same. I sent a copy of the exe's to a few friends locally and asked them to try it. I'll let you know what they find. What type of debugging did you have in mind? |
@rsc & @alexbrainman - I have found something interesting. The problem seems to be related somehow to the Windows system's minimum timer interval (which is 0.5ms on newer systems but 1ms on older systems, from what I read). Simple way to tell. Grab this program and run it. Here's what I see.
Go switches you to 1ms. The timeBeginPeriod call is hard coded to use only 1ms. But if you run something else which switches to 0.5ms (using ntdll.dll - NtSetTimerResolution) at the same time then the nature of the problem I'm seeing changes. It goes away! If anyone wants to duplicate this, here's a handy tool to make the change and to show what the system is current running.
I've upgraded my test progam by adding more test periods, and mde them longer for better precision. Here it is: https://play.golang.org/p/7AwpUwZSDa I'll follow up with the exact info on what I saw today shortly. |
Here are my results using the new version of the test program. The first is the default situation, letting Go switch the system timer to 1ms. The second is with another program running in the background which has set the system to 0.5ms. In the second case the problem goes away!
Also, I have some feedback from two friends. One does not see the problem. The other does - and his system timer is running at 0.5ms (which is what seems to solve the problem for me). I've sent them the new version of the test program to try. Update: 2nd friend just sent me his results from the revised test. His system timer is indeed running at 0.5ms, but he sees the problem regardless - except for the 1.5ms and 2ms tests which do show the extra time due to the work. Strange! |
it seems that in windows 10 the following command (windows.h) sets the max timer interval to 1ms: |
I believe that Go does link to it. |
I do not have any particular idea. But when things look odd to me (just like something bothering you here), I try to add fmt.Printf statements around code in questions (you would have to use println, because you will be debugging runtime package), and see what it outputs. Sorry I did not spend enough time looking at this issue, so I cannot give anything particular to investigate.
Here https://github.com/golang/go/blob/master/src/runtime/os_windows.go#L282 runtime calls timeBeginPeriod(1) during startup. Alex |
Here are the results of some runtime testing I did today. Note: I edited/compressed the results listings to save a little verticle space. In doing so, I left out the time.After() data since it mirrored the time.Sleep data. Note: You might ask why I included the Windows 0.5ms tests. Go only sets it to 1ms. Because the fastest current program running wins, so if another program is using 0.5ms then that is what all programs will see. Also, it not having one problem, but having another instead is probably important.
For reference, using the regular test, here is what I see.
Summary: I don't see a pattern. Unfortunately I wasn't able to get both 1ms and 0.5ms results for all machines, but even so many of the results contradict each other. Of the 3 identified problems, various machines (depending on the Windows timer setting) have 0, 1, 2, or all 3 problems. Anyone have any ideas of what to do next? Thanks, John |
@johnrs I do not know what your problem is. But, I suspect, you expect Once you start playing with intervals close to your clock resolution, unusual things will happen. Alex |
So your system has the problem, too. I started off thinking that this was just a documentation error. But now we know that some machines yield the correct result (something "at least" 1ms + work of ~500us --- which turns out to be 2ms because Go adds a tick to guarantee the "at least"). Even a simple loop just doing a 1ms wait (when the system timer is also 1ms) should take 2ms per pass - like Russ and others see on their machines. Please read above where Russ states that the doc's are correct. Read where he says: "In order to sleep 1ms, we wait for two ticks to elapse (one is not enough)". Your example of waiting for a train is flawed. The interval you have to wait for the first train to arrive can be anything from 0 to 1ms. Only after you are in sync with the timer will the additional trains require waiting 1ms each. But as Russ confirmed, to deliver "at least the duration" Go waits for an additional tick. Using your train example again, this means that you would ignore the next train - regardless of how long (0 - 1ms) it takes to arrive, then you would count the train after it as the "first" train (1 - 2ms). I see nothing "unusual" about this. It's a bug. Many programming languages don't bother to ensure the minimum wait time, thus exposing you to a potentially serious problem. After all, if you ask for 1ms and end up with something much, much less this can cause faults. Unfortunately, the "extra tick" (which prevents this problem at the cost of a slightly longer wait than requested) doesn't work on some (yours and mine!) Windows systems. The usleep function in runtime seems to exhibit the same problem. A quick check shows 7 places where usleep is called in runtime expecting between 1 and 1000us to pass. Any of these can fault to an almost immediate return. This is not a boundary bug. The error is 1 tick, hence is more apparent at or below the boundary. But that doesn't make it unimportant. The 7 cases in runtime prove this. If you had run the revised test I posted you could also see the results at 2ms. |
If you are referring to the Alex |
@rsc
I looked inside a running timer. The period seemed to be 0 all the time (perhaps causing the problem I am seeing). But even if I set it (prior to the "when" time) it didn't seem to change anything. So I must be misunderstanding something. Side question: I'm curious why wait to add the extra tick at the end of the duration instead of doing it at the beginning? It seems to me that doing it at the beginning is extremely easy and would save a considerable amount of processing. Perhaps I'm missing something? usleep: From what I gather usleep uses a totally different mechanism. I see 7 places in runtime where it is called for waits of 1ms or less. Is it a problem if the wait is actually much less than requested? |
@johnrs My apologies if I misunderstand. As far as I know, the Go runtime does not add any extra tick period. Why would it? When
|
@ianlancetaylor The doc's imply that it does. Russ said that it did. Some systems perform as if it does. I see some code which seems to implement it partially. Why should it? If you do then worst case is your interval is 1 tick too long. If not then worst case is your interval is near 0. Thanks for the explanation of t.period. That explains why it would continue to keep adding time. I was totally focused on just the +1 tick and I was thinking this was it, but not so. :) Regardless of whether there should be +1 tick or not, I'm seeing too many inconsistent results. At a bare minimum I think that it's fair to say that the doc's don't agree with about half of what I'm seeing. I can't help but think that this is a problem. |
I think that most would agree that 1ms timing resolution is more desirable than 15ms resolution (except when trying to save power). Which to use has changed a few times in Go. The most current solution coming in v1.9 will hopefully be optimum for the majority of uses. Allowing for extreme cases is something which is still on the table, as per #20937. I plan to open an issue about it after v1.9 is released. This issue is about the time.Sleep's (and other timers) documentation saying:
As far as I can tell, Go has never performed this way (including the upcoming (v1.9). The situation is this. Let's say that you are running with the system timer set to 1ms. If you do a time.Sleep(1ms) there are two possible results (ignoring the statistically unlikely case of exactly a 1ms sleep). You will either sleep for less than 1ms or more than 1ms (generally between 1ms and 2ms). The documentation says it should be "at least" 1ms. But testing shows that it is generally "at most" 1ms. Hence the issue. I've seen time.Sleep(1ms) take anywhere from as little as 10us to 1ms. A more accurate description of the current behavior would be something like:
The fix can be either to change the documentation to match the code, or to change the code to match the documentation, but leaving it as is is simply wrong. Personally, I prefer changing the code to match the documentation. I feel that it is the safer, most useful solution. In other words, I think that time.Sleep(1ms) resulting in a 10us sleep is much worse than a a 2ms sleep. The QueryPerformanceTimer is interesting for high precision (550ns resolution on my machine) interval measurement but it comes at the cost of higher overhead, so I wouldn't want it to become the default but perhaps an option instead. Also, I don't think that it would help in the case of time.sleep() since it doesn't provide a "wait" feature. Discussion here dropped off since there were other changes being made to time (wall clock vs monotonic) with the hope that perhaps the issue would be resolved as a side effect. I don't believe that it has, hence I plan to ignite the discussion again after the v1.9 release. |
I'd say that's debatable. I'd like to think most wouldn't agree if they understand the costs. Most general application have really no need for the higher resolution. The fact that, to this date, there is no way baked into the .NET ecosystem (a ecosystem built for Windows) to do this without resorting to P/Invoke and calling the mm APIs stand on that very rationale as well. Only the application that really really need it, do so - and they can still do so with a very simple call. But other than that, I am on the same page with you on everything else. I also suppose while a lot of my points had contextual relevance, it isn't directly of immediate relevance to this issue. I shall wait for the discussion to ignite again on it, as you said. :) |
I did leave out one other consideration: It also allows programs (your and others) to run a slight bit faster due to the reduction in overhead. I don't think it's a major factor, but seems to matter to some. Did you have any other costs in mind? OK. Looking forward to your participation. :) |
I don't understand your explanation, sorry.
I won't argue with that. Alex |
I think you have an incorrect understanding of the windows scheduler? When you increase the resolution, programs don't run faster - they run slower, with more overhead. Back to some fundamentals - when you increase the res to 1ms, what happens is the kernel interrupts every 1ms instead of 15ms, which means there's a lot more context switches, and more book-keeping to do. So, it While I generally see a lot of complaints when directly translating UNIX philosophies on to Windows - it's simple because you're solving it against the platform's way. Windows kernel scheduler is actually far more intelligent than most realize. Unless, there's a lot of contention for threads, and it sees a compute heavy thread, it sets the affinity of the thread to a specific CPU (on multi-cores), and pins it for multiple quantum slice. So, what this means is, the scheduler completely gets out of the way, and your program gets all the CPU it needs. Now with 1ms, you're basically forcing the kernel to do more book-keeping. So, your programs actually get slower. You can very easily test this. Just run a simple C (or even a C#) program, and let it run a heavy computation. It will complete faster on 15ms than when running on a higher res. I don't say "Go" program, because I don't yet know the internals of the Go scheduler to understand where it interferes with this, in order to workaround it. The only place where it could potentially be a little faster (but even here, in my naive tests, the kernel seems to work it's magic), is when you have a lot of "active" threads contending for the CPU constantly, and your program is not in the foreground. (This is a decade old optimization for foreground programs to get better quantum slice priority). Here's a simple C# program to illustrate the same: using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Runtime.InteropServices;
namespace dtest
{
class Program
{
static List<TimeSpan> s_results = new List<TimeSpan>();
static Stopwatch s_timer = new Stopwatch();
static void Main(string[] args)
{
NtQueryTimerResolution(out int minRes, out int maxRes, out int currentRes);
Console.WriteLine("Resolutions - current: {0}, {1}, {2}", currentRes, minRes, maxRes);
NtSetTimerResolution(maxRes, true, out currentRes);
Console.WriteLine("Setting to highest - current: {0}, {1}, {2}", currentRes, minRes, maxRes);
for (int i = 0; i < 20; i++)
Run();
NtSetTimerResolution(maxRes, false, out currentRes);
Console.WriteLine("Setting back to normal - current: {0}, {1}, {2}", currentRes, minRes, maxRes);
for (int i = 0; i < 20; i++)
Run();
s_results.ForEach(x => Console.WriteLine(x.TotalMilliseconds / 1000 * 1000 + "ns"));
}
static void Run()
{
s_timer.Reset();
s_timer.Start();
var i = 0;
for (; i < int.MaxValue; i++) { }
s_timer.Stop();
s_results.Add(s_timer.Elapsed);
}
[DllImport("ntdll.dll", SetLastError = true)]
static extern int NtSetTimerResolution(int DesiredResolution, bool SetResolution, out int CurrentResolution);
[DllImport("ntdll.dll", SetLastError = true)]
static extern int NtQueryTimerResolution(out int MinimumResolution, out int MaximumResolution, out int CurrentResolution);
}
} It goes a bit further and sets the timer res to 0.5ms on systems that support it for the sake of the argument. According to my tests:
When resolution is 15ms vs 0.5ms - the former is about 30ns faster to run one full loop from 0 to int.MaxValue Now, the same program when with almost no thread contention (Previously there were a bunch of processes in the background running in like Chrome, ProcessHacker, VSCode etc. - which is why the difference was much greater. Kernel's doing a lot more book-keeping. Note though: they are still in background. Foreground process is always given higher quantum slices.)
Here the difference is much smaller. Though, lower resolution is still faster with less overhead to deal with. Feel free to run the tests on difference scenarios to understand it's implications. Windows uses a timer resolution of 15ms, because it's a good compromise between stealing programs time, and resolution. It's not very difficult to auto tune this resolution in the kernel, and emulate it like UNIX as much as the interrupt timer resolution allows (typically 0.5-1ms). But it's trade-off that has to be understood explicitly by each program before it does this. I like to believe that's possibly a reason Microsoft simply chose not to do it, and to provide make 15ms the default resolution and program that knows that it needs more can explicitly go ask for it - which is also why I think it's far better to stick to the consistent model that the platform recommends than letting every random Go programs take extra interrupts without knowing the consequences. The unfortunate thing here, that saddens me however, is that the runtime itself makes these bad decisions. |
Hi Alex.
Sorry I'm not being clear. Perhaps it would help if we backed up a step. Do you agree that there is a problem? If so, what solution do you prefer? |
Yes, that's what I meant. When I referred to another consideration I meant in addition to saving power, when running at 15ms.
Better in v1.9 than it was, I think. And can be improved as per issue #20937. |
Ah! I mistook it the other way, and was confused why you meant it would increase the perf increasing the timers. Guess, I could have saved myself a few minutes not having to write out that naive test. Haha. Anyways, yes. 1.9 is better on idling. 👍 |
No sweat. Save the test results. They would support the case for using 15ms even when the a program isn't idling. :) |
I don't agree there is a problem here. I think I already explained my view. I think that you expect your program to work in a particular way, but that is not how it works. I agree that Go documentation is not correct for some scenarios you've described. But, I submit, we should leave documentation alone, because these scenarios do not happen often enough. More documentation will just make things even more confusing. Alex |
Incorrect behavior. time.Sleep doesn't work as documented:
A more accurate description of the actual behavior would be something like:
Note: I only make this claim for Windows. I do suspect that the problem exists on all platforms, but I do not claim that at this time. The difference in behavior (documented vs actual) can result in problems and can make a program harder to reason about. Severity: Depending on the code, it can cause loss of data (high priority), bogus retries (medium priority), or not matter (low priority). The symptoms are typically non-deterministic and intermittent thus making the problem hard to troubleshoot. It caused a program of mine to fail and that's why I decided to report it. I believe it happens more frequently than people realize because it generally doesn't cause data loss,just retries, or some other fatal effect. There was some confusion due to problems with early versions of the test program: Testing at the tick boundary, not allowing for the Windows clocks smaller than 1ms (ex: 0.976ms), and the "sleep + work" test results being hard to interpret. So let's refer to a simple, clear test result. From my post on May 26 Hit Enter for a Sleep(900us), Ctrl-C to exit:
I can't think of a scenario in which it is correct. Please describe one.
I believe that the goal should be:
I believe that both Rob and Russ have said that the documentation is correct as is. So the remaining option is to change the code to match it.
In light of the early confusion, and to help me understand your position, could you please restate it in the context of this message? |
So, you want documentation changed. Thank you for explaining.
Perhaps you are correct. Perhaps it is impossible to describe one. But I consider myself a practical person. The problem you are describing does not bother me.
I am not convinced we need to change code or documentation. Alex |
No, that is not what I said.
I understand. Would it being fixed bother you? |
Fixes are not free. Recently we made changes related to this (see CL 47832 and 47833). Austin had to spend his time implementing this changes. Austin could have improved something else instead. And now (after CLs 47832 and 47833) we have more code in runtime that all runtime developers have to look at and consider. And the code can break in the future. Same with "improved" version of documentation. More complicated documentation means that every Gopher (even the ones who are not affected by this issue) must read and understand all these new details. So the problem must be worth fixing. But personally I am not convinced. Alex |
Very little in life is. :)
Those were related to CL 38403 and issue 20937, not this issue.
Then let me propose an alternate change to the documentation:
This is both simpler and more correct. Win-win! |
I agree this is very small change. But I still cannot make this decision. I did not design / document that API, so I cannot make that call. You have to convince others. Sorry. Alex |
@johnrs, just been thinking about this again - I think there's a problem with your time.Now() resolution measurement. You'll get different time.Now() resolution by doing this during different times depending on the how the OS schedules your threads - which also implies your estimation of dowork's time is incorrect. It can or cannot be different each time you run it, depending on when and where the OS thread gets interrupted. I think this very fragile way to test this. |
@prasannavl - You are correct, of course. But if care is taken to run the test on a system that is lightly loaded and has multiple CPUs then the chances that the test's thread will get paused are pretty small. Dowork was an attempt to introduce a delay which didn't depend on the system clock. It was messy and the result (sleep for slightly less than 1 tick + work for half a tick) was hard to explain to people. Some assumed that the result should be the sum of the two. Work should have caused rounding to the next clock tick, for a total of 2 ticks. A result of 1 tick showed that the sleep was less than 1 tick. It can be hard to follow the logic. Oh, and don't forget to disable turbo (speed-stepping) mode if your BIOS has it on, else it can makes the "work" loop pretty inaccurate. On my PC with only a single core running the CPU clock is about 40% faster than when all 4 cores are running. I finally just used the QPC timer to measure the intervals. The results were clearer since they didn't need any interpretation. In later versions I didn't loop at all and just displayed a few samples. That was enough to show if the results were "steady" or not. But I think that the Manual test is the easiest to understand. It simply times a sleep which is started at a random time (relative to the system clock's phase). If I wanted to determine the tick resolution in a production program I'd probably use a loop (~10 passes, well under the 1ms scheduling interval) and I'd use the minimum value, not the average value. I think it's fair to say that the thread would not be paused at least once during those 10 passes. Also, Ian suggested that locking the goroutine to the thread might help. |
I understand the premise. But just wrapping my head around to see if there's a better way to test this while dealing with such low intervals.
My concern is the definition of what's 1 tick - and how that's measured. QPC can measure intervals with the highest possible accurately, but not how many ticks a particular program has gone through. I think the most accurate (and Microsoft recommended way) to do this will be to run xperf so we can visualize it right down to the thread quantum of the program in PerfView). But this gives just accuracy, not the precision needed, since the overheads might be way too high. The only possible way to measure that with some sense of reasonably reliability would be to write a kernel driver, which is just madness for this particular problem. I'm trying to wrap my head around it, to understand if there's any way to do a better analysis without doing that. May be xperf and perfview might just be able to provide something indicative here. Not sure. |
The intervals for Windows aren't that low, so QPC works fine. It has nearly 2000 times the resolution of a tick on my machine. From what I read this is typical.
My scheme seems to produce repeatable, correct results for the QPC, time.Now, and time.Sleep(1) ticks. Indeed, that the QPC tick measurement works (admittedly it does double or otherwise fail from time to time) showing 550ns says something. :)
It's going to continue running even if your program is paused. Is this what you mean? For testing I don't see this as a problem since we know that running the test multiple times, the smallest result is correct. I'm in the middle of making a few improvements to my test program, including the "loop and use the minimum" idea, now. I will post it on the Play site hopefully in an hour or so. I'll let you know when I do. Also, if needed, using keyboard input to de-synchronize the time.Sleep can be replaced by some other long running, but randomized runtime, routine. So I think that the Windows case is well covered. Platforms with faster ticks are another matter, however. I'd like to have a test for high resolution (<= 1us) ticks. Actually, 1us isn't hard to measure and I think that my current scheme should probably work down to about 100ns, but if it gets down to 10ns then yuck! On Linux, if the time.Now tick and the time.Sleep(1) (1 = minimum sleep time) ticks are the same, as in Windows, then something with even more resolution is needed to make measurements on time.Sleep when used in a non-synchronized mode. Ouch! Do you happen to know if they are the same? Any idea what they are? Or I could revert to the "work + sleep" test. It's sloppy but I eliminated about half of the inaccuracy in later versions by doing an auto-calibrate. I certainly wouldn't want to use it in any production project, however! Finally, if the ticks get below 100ns you have to ask if it's even worth worrying about any +/- 1 error. :) |
Done. https://play.golang.org/p/nH3LnBNQtj This is what I see.
|
- remove unnecessary control - change sleep - change stop execution - increase timeouts golang/go#17696 Co-authored-by: @erhanakp
What version of Go are you using (
go version
)?go version go1.7.3 windows/amd64
What operating system and processor architecture are you using (
go env
)?set GOARCH=amd64
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOOS=windows
What did you do?
I wrote a test which is basically a loop consisting of two things: Some busy work for about 500us and a call to time.Sleep(1 * time.Millisecond).
Here's a link to the test program: https://play.golang.org/p/VYGvIbTPo3
It can't run on the playground, so you will have to grab it and run it yourself.
What did you expect to see?
What did you see instead?
Discussion
I believe that there are errors in the doc's for time.Sleep and time.NewTimer. They both make similar statements regarding the duration.
Sleep pauses the current goroutine for at least the duration d.
NewTimer creates a new Timer that will send the current time on its channel after at least duration d.
If these are correct, then the test loops should take ~1.5ms or longer per pass. But they don't.
I believe that
at least the duration d
was an attempt to convey that overhead might result in a longer duration than requested. But there are two other factors which might result in a shortened duration, sometime MUCH shorter. These factors are the resolution and phase of the time source used for these functions.In Windows, the resolution is normally 1ms for Go programs. I'll refer to this value as r, short for resolution. I'm not sure what it is on other platforms. If you run the test program it will measure and display it. Please let me know what you find.
I believe that in general the duration can be as short as d-r (rather than d). How short simply depends on what the phase of the time source when you use it. Thus if you ask for a 100ms delay you will end up with something between 99 and 100 (plus overhead). Thus as you use smaller values of d (approaching r) the problem gets worse. By itself, this isn't too bad. But now the other shoe drops.
In the specific case that you ask for a delay of r (or anything between 1 and r), then the delay can be anything between 0 and r. Actually, for a reason I don't quite understand, the lowest delay I've actually seen is about 50us, not 0. Even so, that is 20 times less than the 1ms requested. This was drastic enough to break some of my code. I changed d from 1ms to 2ms. The resulting delay is between 1 and 2ms (plus overhead).
Can this cause a problem? Generally, in a loop only the first pass is affected, because the loop gets in phase with the timing source. So this might or might not be bad for you. If it is, put a sleep 1ms call just before the loop.
But imagine that you are using time.After to set up a timeout, but it occurs too quickly. Perhaps 20 times too quickly. This can cause intermittent, non-deterministic, bogus timeouts, even on a single-threaded program. Ouch! This is what bit me.
Another gotcha: Using d of 10ms now, but Go Windows switches r from 1ms to 15.6ms. Instead of a delay of 9 - 10ms you now get 0 - 15.6ms.
Possible Solutions
The really bad case when d is between 1 and r could be detected causing d to be changed to r+1. Tacky, in my opinion. Alternatively, vet could look for this case.
Most programming languages I've worked in allow this to happen, but they don't document it incorrectly. This seems like the best solution to me. Simply fix the doc's to reflect reality.
Perhaps a paragraph at the top describing what the "resolution of the time source" means, what it is for some typical platforms, and an example of a simple routine to determine what yours is. Maybe even a system call to retrieve it.
Then, change the current time.Sleep description from
Sleep pauses the current goroutine for at least the duration d. A negative or zero duration causes Sleep to return immediately.
to something like
Sleep pauses the current goroutine for at least the duration d - 1. A duration less than or equal to the time source resolution can cause Sleep to return quickly. A negative or zero duration causes Sleep to return immediately.
And similar for time.NewTimer's description.
The text was updated successfully, but these errors were encountered: