{"id":428,"date":"2023-04-27T19:59:48","date_gmt":"2023-04-27T19:59:48","guid":{"rendered":"https:\/\/blog.mozilla.org\/data\/?p=428"},"modified":"2023-04-27T19:59:48","modified_gmt":"2023-04-27T19:59:48","slug":"never-look-at-the-data-why-did-we-start-getting-so-many-pings-from-korea","status":"publish","type":"post","link":"https:\/\/blog.mozilla.org\/data\/2023\/04\/27\/never-look-at-the-data-why-did-we-start-getting-so-many-pings-from-korea\/","title":{"rendered":"Never Look at the Data: Why did we start getting so many pings from Korea?"},"content":{"rendered":"\r\n

Something happened on January 5, 2023. All of a sudden we abruptly started receiving a number of pings from Firefox Desktop clients in Korea equal to two times the size of the entire Korean Firefox Desktop population.<\/p>\r\n

What happened? How did we notice it? What did we do about it?<\/p>\r\n

Let\u2019s back up.<\/p>\r\n

I can\u2019t remember where I learned it, but I\u2019d already started reciting as dogma in my first year of University: \u201cThe most important part about any feature is the ability to turn it off\u201d. It\u2019s served me well through my studies and my career. I\u2019ve also found it to be especially true for data collection systems where, for whatever reason, as a user you might decide you no longer want the software you\u2019re using to continue to send data. In some places this is even enshrined in laws where you can request the deletion of data that has already been collected.<\/p>\r\n

Law or not, Mozilla has before, does now, and will always make it easy for you to decide whether to send data to Mozilla. We may not understand why you make that choice, and it definitely will make it harder for us to ensure our products meet your needs, but we\u2019ll respect the heck out of your choice in our processes and in our products.<\/p>\r\n

This is why, when Mozilla\u2019s data collection system Glean is told the user went from allowing data upload to forbidding it, we send one final \u201cdeletion-request\u201d ping<\/a> before shutting down. The \u201cdeletion-request\u201d ping contains all the internal identifiers we\u2019ve used to longitudinally group data (if we receive ten crash reports it\u2019s important to know whether it\u2019s the same Firefox crashing ten times or if it\u2019s ten Firefoxes crashing once), and we use those identifiers to (well) identify what data we\u2019ve collected that we\u2019re now going to delete.<\/p>\r\n

For the purposes of this story you\u2019ll need to know that there\u2019s two times when Glean notices the product\u2019s gone from \u201cdata upload: on\u201d to \u201cdata upload: off\u201d: while Glean is running, and during Glean startup. If Glean\u2019s running, then we just handle things \u2013 we were told the setting changed from \u201cdata upload: on\u201d to \u201cdata upload: off\u201d and away we go. But Glean knows that it isn\u2019t always listening to the data upload setting, so if it it starts up with \u201cdata upload: off\u201d and the last time it shut down we were \u201cdata upload: on\u201d we\u2019ll send a specific \u201cat_init\u201d-reason \u201cdeletion-request\u201d ping.<\/p>\r\n

We in the Data Org monitor how Glean is behaving. One thing we\u2019ve learned about how Glean behaves is that the number of \u201cdeletion-request\u201d pings is roughly constant over time. And the proportion of \u201cdeletion-request\u201d pings that have the \u201cat_init\u201d reason should remain a fairly fixed one.<\/p>\r\n

What shouldn\u2019t happen is for Firefox Desktop-sent \u201cat_init\u201d-reason \u201cdeletion-request\u201d pings to spike like this on January 5:<\/p>\r\n

 <\/p>\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n\r\n

\"time-series<\/figure>\r\n\r\n

 <\/p>\r\n\r\n

What we do when we notice things like this is file a bug<\/a>. As the one responsible for Glean\u2019s integration in Firefox Desktop, and as someone with a long history of looking into anomalies<\/a>, I took a look. At this initial point I was pretty sure it\u2019d be a single actor (a single user, a single company, a single internet cafe) doing something odd\u2026 but alas, the evidence was inconclusive:<\/p>\r\n

Evidence consistent with a single actor being responsible for it all:<\/p>\r\n