Closed
Bug 1449739
Opened 7 years ago
Closed 6 years ago
Investigate possible bug in Profile Creation Date calculation
Categories
(Toolkit :: Telemetry, enhancement, P1)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
People
(Reporter: frank, Assigned: janerik)
References
Details
Attachments
(1 obsolete file)
We're seeing profile creation dates that differ significantly from min(submission_date_s3) for clients. Essentially, 10% of clients are showing more than a year of difference between the two; 5% are showing 3.5 years. This seems more extreme than clock skew alone. Note this is all done on post-pingsender data, so latency should be low as well. Detailed analysis by Ben Miroglio here: https://gist.github.com/benmiroglio/91ade009ddc81aee687870f3d67af24f Take note especially of the last two plots, which show the CDF of (min(submission_date_s3) - PCD).
Reporter | ||
Comment 1•7 years ago
|
||
Running the numbers in STMO: https://sql.telemetry.mozilla.org/queries/52284/source#139394 For new profile pings since 2018-02-01: - 10% of clients reported a profile_creation_date 150 days or more before the submission_date - 5% of clients reported a profile_creation_date 1126 days or more before the submission_date Notice that it is heavily biased for before submission_date.
Reporter | ||
Comment 2•7 years ago
|
||
Addendum (pings not clients): For new profile pings since 2018-02-01: - 10% of pings reported a profile_creation_date 150 days or more before the submission_date - 5% of pings reported a profile_creation_date 1126 days or more before the submission_date
Comment 3•7 years ago
|
||
We use ProfileAge.jsm (which, aside from broad refactors, has been unchanged for years) to calculate the age of the profile. It uses the creation date (or modification date if creation date is not present) of files present in the profile dir.
Updated•7 years ago
|
Priority: -- → P2
Comment 4•7 years ago
|
||
I wonder how much this might interact with clock skew... Investigation required. Clocks are hard.
Reporter | ||
Comment 5•7 years ago
|
||
(In reply to Chris H-C :chutten from comment #4) > I wonder how much this might interact with clock skew... > > Investigation required. Clocks are hard. It's not Clock Skew. 99.29% of clients report a clock skew of less than a day [0], and 99.6% within 2 days [1]. That makes sense, because Clock skew would (presumably) be evenly distributed over the interval [2], but the PCD/submission_date offset is heavily skewed. [0] https://sql.telemetry.mozilla.org/queries/52348/source#139551 [1] https://sql.telemetry.mozilla.org/queries/52348/source#139535 [2] Citation Needed, though the above plots do seem to support that
Reporter | ||
Comment 6•7 years ago
|
||
Ugh, Fix: "99.29% of new-profile pings" _NOT_ clients.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → jrediger
Reporter | ||
Comment 7•7 years ago
|
||
Bit of follow-up now that this is assigned: this does seem to be affecting every OS, but not equally. For example, Darwin and Linux both report 85% with 0 or 1 days of difference, but Windows is 36 days to get to 85%. The key here is that the initial break-point is different for each OS, with a different height, but the long tail looks relatively similar. https://sql.telemetry.mozilla.org/queries/52416/source#139660
Assignee | ||
Comment 8•7 years ago
|
||
I have looked into that trying to find an explanation or any bugs in the code. So far I did not come up with a root cause for this. I will list the things I found out: * ProfileAge.jsm. The code tries to read a timestamp from `times.json` or, if not present, iterate through all files in the profile directory to find the oldest (by creation date/modification date). This date is stored in `times.json`. * I looked for the first occurence of this problem and it looks like it's always been present since the introduction of the `new-profile` ping a year ago * My first hunch was that the `new-profile` ping is sent when users upgrade their browser to a version that enables that ping. However, that case explicitely does not trigger a `new-profile` ping[1]. I verified that we have a test for that[2] and that the test works as expected (though it does block, not fail when the wrong behaviour is encountered). * :chutten brought up the issue that ProfileAge might start from the wrong directory (i.e. the parent of the current profile). I checked that in the code and ProfileAge uses the current profile's directory as expected [3] (TelemetryEnvironment invokes that with `profile = null`). I also verified that locally, logging which files are taken into account. [1]: https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetrySession.jsm#1967-1969 [2]: https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/tests/unit/test_TelemetryController.js#575-576 [3]: https://searchfox.org/mozilla-central/source/toolkit/modules/ProfileAge.jsm#19 As listed above, I did not find the root cause. The code looks correct in those places. The only current explanation would be that somehow very old files end up in the profile directory (or the filesystem reports old stuff). However, there would be a simple fix going on from here: Instead of scanning all profile files for the oldest one and then storing that state we can fill the `times.json` when a new profile is created directly.
Updated•6 years ago
|
Priority: P2 → P1
Comment hidden (mozreview-request) |
Assignee | ||
Comment 10•6 years ago
|
||
We now have some data [1] showing that we indeed encounter really old files during a profile directory scan. I finally found some time to trace how Firefox create profiles. We already have code writing a times.json file with the current timestamp[2]. This code is not triggered if the profile directory already exists (or `-profile path/to/profile` was used on the command line). We can avoid scanning if we just store a timestamp early. Choosing the current time during startup of Firefox seems to be as good as any other date (and that only happens if there's not already a times.json). I attached a patch implementing this (We probably need review from some module owner of xre and profile if we want to go forward with this). [1]: https://mzl.la/2H8eibn [2]: https://searchfox.org/mozilla-central/rev/38bcf897f1fa19c1eba441a611cf309482e0d6e5/toolkit/profile/nsToolkitProfileService.cpp#803-807
Comment 11•6 years ago
|
||
mozreview-review |
Comment on attachment 8982519 [details] Bug 1449739 - Write times.json (if missing) after selecting a profile. https://reviewboard.mozilla.org/r/248488/#review254802 f+ On every start of Firefox, write (if it does not exist) a times.json file to the current profile's directory using the current time as the content? Seems legit. Weird clients can still break this: maybe a client doesn't have write access to their own profile directory and so times.json will continue to not exist? Maybe the profile dir is overwritten frequently and the times.json will march forward in time? But weird clients will always be able to break things. How do the number of profileage scans compare to the number of new profiles? Will you remove non-times.json logic from ProfileAge in this bug as well? (and its associated data collection)
Attachment #8982519 -
Flags: review?(chutten)
Assignee | ||
Updated•6 years ago
|
Attachment #8982519 -
Attachment is obsolete: true
Attachment #8982519 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 12•6 years ago
|
||
After a discussion with :gfritzsche I move over the implementation to bug 1466518, looping in the pipeline team once I have this written up more clearly.
Assignee | ||
Comment 13•6 years ago
|
||
W
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → INCOMPLETE
Assignee | ||
Updated•6 years ago
|
Resolution: INCOMPLETE → FIXED
Assignee | ||
Comment 14•6 years ago
|
||
The underlying issue is not fully fixed yet, but the investigation of it is. We now have another probe as well as documentation coming. As a follow-up we should make a decision on how to deal with the out-of-date data eventually.
Blocks: 1495792
You need to log in
before you can comment on or make changes to this bug.
Description
•