Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pinned browser tab fails to reload sometimes after computer wake #22721

Open
carols10cents opened this issue Aug 15, 2022 · 11 comments
Open

Pinned browser tab fails to reload sometimes after computer wake #22721

carols10cents opened this issue Aug 15, 2022 · 11 comments
Labels
area: misc difficult Issues which we expect to be quite difficult needs reproducer priority: high rust community request Issues of interest to the Rust community

Comments

@carols10cents
Copy link

What happens

Sometimes when I open up my laptop after it's been asleep for a while, and I switch to the pinned Firefox tab where I have Zulip, I see this screen that only says "Loading..." with a broken image:

Screen Shot of the broken loading screen

And it seems to change to this with an additional message saying "If this message does not go away, try reloading the page" at the bottom, not sure if the message appears after an amount of time or due to some action I took:

Screen Shot of the broken loading screen with additional message

I do have to use the browser reload to reload the page to get Zulip back.

If I don't check this tab soon after waking up my laptop to notice and reload the page, then I miss notifications.

Here are the logs I see in the console, they don't look particularly enlightening... I don't see anything interesting in the Network tab, but if there's something I can investigate next time this happens, please let me know.

Console tab logs
Cookie “django_language” does not have a proper “SameSite” attribute value. Soon, cookies without the “SameSite” attribute or with an invalid value will be treated as “Lax”. This means that the cookie will no longer be sent in third-party contexts. If your application depends on this cookie being available in such contexts, please add the “SameSite=None“ attribute to it. To know more about the “SameSite“ attribute, read https://developer.mozilla.org/docs/Web/HTTP/Headers/Set-Cookie/SameSite rust-lang.zulipchat.com
Loading failed for the <script> with source “https://zulipchat.com/static/webpack-bundles/998.2d149fb8aab718ecae0b.js”. rust-lang.zulipchat.com:25:1
Loading failed for the <script> with source “https://zulipchat.com/static/webpack-bundles/952.24f993263fbbf8c8dd22.js”. rust-lang.zulipchat.com:29:1
Loading failed for the <script> with source “https://zulipchat.com/static/webpack-bundles/app.161f6701fa5e49112340.js”. rust-lang.zulipchat.com:34:1
Loading failed for the <script> with source “https://zulipchat.com/static/webpack-bundles/617.1b58ad8c9d23b3fe888c.js”. rust-lang.zulipchat.com:26:1
Loading failed for the <script> with source “https://zulipchat.com/static/webpack-bundles/14.dd32f3b050e534938fee.js”. rust-lang.zulipchat.com:27:1
Loading failed for the <script> with source “https://zulipchat.com/static/webpack-bundles/126.c03e7f21cbd5eec6dc8a.js”. rust-lang.zulipchat.com:24:1
<source> element has no “src” attribute. Media resource load failed. 4 rust-lang.zulipchat.com
All candidate resources failed to load. Media load paused. 2 rust-lang.zulipchat.com

What I expected to happen

Sometimes everything works great, meaning I open up my laptop and switch to the Zulip tab and it has the latest chat messages there-- it was able to reload itself without any errors. This is what I expect to happen all the time.

I haven't been able to figure out any correlation between actions I take or situations and when the reload works or not. I've been trying to catch this issue to report it for about 5 days, and this is the first time it's happened in that time period. I can try and keep track of anything that would be helpful if there are suspicions.

My somewhat-unreliable memories are that it seems to happen more often when I open my computer in the morning after sleeping all night (US Eastern time) rather than during the day... it would make sense to me if it had something to do with server restart overnight? Knowing little to nothing about Zulip's code or operations, this is just a wild guess based on my gut instincts, feel free to ignore me if that doesn't sound likely :)

System information

  • macOS Monterey, MacBook Pro (14-inch, 2021), Apple M1 Max
  • Firefox 103.0.1 (64-bit)
  • No network proxies or firewalls
  • Add-ons: Facebook Container, LastPass, Refined GitHub, uBlock Origin

I know this is rough because I have no way to reproduce this at will. Please please please let me know if there's anything I can log or investigate to help track this down!

@davidtwco
Copy link
Contributor

Could we tag this as rust community request Issues of interest to the Rust community ? A few members of the Rust community have been running into this :)

@carols10cents carols10cents changed the title Browser tab fails to reload sometimes after computer wake Pinned browser tab fails to reload sometimes after computer wake Aug 15, 2022
@shepmaster
Copy link

I see a very similar issue without a pinned tab and using Safari. In my case, the images and CSS appear to be loaded (I neglected to grab a screenshot of this page, unfortunately). Like @carols10cents, this feels like an issue that occurs when I put my laptop to sleep and open it later. Today, I believe I left my laptop closed for about four days.

My console logs are similarly not very expressive:

image

  • Safari Version 15.6 (17613.3.9.1.5)

@carols10cents
Copy link
Author

Full disclosure: @shepmaster and I are on the same network 99% of the time. But unless it's DNS or something like that, I don't think it's relevant... (famous last words)

@Urgau
Copy link

Urgau commented Aug 15, 2022

I have also had this issue multiple times from multiple locations and networks and always after resuming my laptop.

@timabbott
Copy link
Member

Thanks for the report, folks! It's likely that the issue is fairly subtle. What is happening here is that Zulip is asking the browser to reload, which it does if you've been disconnected from the server for more than 10 minutes (and we've thus garbage-collected your real-time push queue). And the browser is not processing our request to reload. (And the little textual message appears after N seconds of the browser having not changed state after the request to reload).

When we've had this class of issue in the past, it's largely been browser energy saving features that are choosing to ignore Zulip's request to reload the browser window entirely. They have sometimes disappeared without changes in Zulip due to browsers improving their algorithms.

However, a common cause that could be fixed in Zulip is if the Zulip app is consuming more CPU than would be ideal in steady state, since that can impact the browser's model for how expensive the Zulip tab is and thus whether it should not run its code until a user clicks into it. I'd consider running a browser profiler to see if Zulip is using CPU in the background for you; this can be caused by animated GIFs/emoji being visible as that's fundamentally how animated GIFs work (which we're trying to work around for Zulip in #20464 and related PRs), or sometimes by regressions in the highly performance-optimized right sidebar live update logic.

Another possibility, suggested by the HTTP 303 exception in your browser console, is that your VPN or network driver or something was still booting at the time that the Zulip window tried to reload, and rejected the network request in a way that caused retries to not occur.

It's possible to simulate a garbage-collection related reload on a browser by sending this API request with the browser's queue_id captured from the Developer Tools' Network tab.

Hopefully this background is enough for one of you to do a bit of debugging to narrow this down towards something that can be reproduced and fixed.

@timabbott timabbott added area: misc priority: high difficult Issues which we expect to be quite difficult needs reproducer labels Aug 19, 2022
@zulipbot
Copy link
Member

Hello @zulip/server-misc members, this issue was labeled with the "area: misc" label, so you may want to check it out!

@carols10cents
Copy link
Author

I'd consider running a browser profiler to see if Zulip is using CPU in the background for you

Hi @timabbott! Do you have a link to instructions on running a browser profiler in Firefox that would produce useful output for you?

Another possibility, suggested by the HTTP 303 exception in your browser console, is that your VPN or network driver or something was still booting at the time that the Zulip window tried to reload, and rejected the network request in a way that caused retries to not occur.

I'm not using a VPN... do you have a suggestion on how to get more information from browser logs to see if this is the case? I've turned on "persist logs" for the console and network tabs of the browser developer tools to try and capture everything that happens when I hit this problem, does that sound like it will be useful?

@carols10cents
Copy link
Author

Ok, so I left "persist logs" on in the browser console and I caught an instance of this bug. I've downloaded all the messages and I'm happy to share that if that would be helpful, but here's what looks important/interesting to me:

I closed my laptop right at 17:00 (quittin' time) and opened it again around 19:30.

Right at 17:00, I see these successful requests to /json/events and /me/presence in the logs:

17:00:14.905 XHR GET https://rust-lang.zulipchat.com/json/events?dont_block=false&queue_id=1659935141:140481&last_event_id=231&client_gravatar=true&slim_presence=true
[HTTP/2 200 OK 45025ms]

17:00:48.528 XHR POST https://rust-lang.zulipchat.com/json/users/me/presence
[HTTP/2 200 OK 52ms]

Then at 17:01 I see a request to /json/events that doesn't appear to get a response:

17:01:00.958 XHR GET https://rust-lang.zulipchat.com/json/events?dont_block=false&queue_id=1659935141:140481&last_event_id=232&client_gravatar=true&slim_presence=true

Transferred0 B (0 B size)

Then the next thing I see is this message in the logs at 17:17 and two more requests that don't appear to get responses:

17:17:10.088 Restarting get_events due to unsuspend blueslip.ts:34:27
17:17:10.097 XHR GET https://rust-lang.zulipchat.com/json/events?dont_block=true&queue_id=1659935141:140481&last_event_id=232&client_gravatar=true&slim_presence=true
Transferred0 B (0 B size)

17:17:26.632 XHR POST https://rust-lang.zulipchat.com/json/users/me/presence
Transferred911 B (0 B size)

Then the interesting one, a 400 bad request to /json/events:

17:17:56.595 XHR GET https://rust-lang.zulipchat.com/json/events?dont_block=true&queue_id=1659935141:140481&last_event_id=232&client_gravatar=true&slim_presence=true
[HTTP/2 400 Bad Request 122ms]

{"result":"error","msg":"Bad event queue id: 1659935141:140481","queue_id":"1659935141:140481","code":"BAD_EVENT_QUEUE_ID"}

The browser sending a bad event queue ID doesn't really sound like anything you were supposing, unless I'm misunderstanding? It seems like this request is making it to the server, so it's not a network problem or the browser refusing to reload...?

@alya alya added the rust community request Issues of interest to the Rust community label Aug 23, 2022
@timabbott
Copy link
Member

timabbott commented Aug 23, 2022

Thanks for the investigation!

The log output you've posted is I think all expected. The requests not getting responses are presumably networking timeouts due to the client not being online yet when trying to reach the server. Zulip's server=>client push event queues avoid leaking memory by being garbage-collected after 10 minutes of inactivity; the "Bad event queue id" API response is the expected one if your queue has been garbage collected.

When the client receives that error, it calls the following function. This does a few things:

  • Saves some state about your scroll position, compose box state, etc., via preserve_state, so that you end up in exactly the same place after the reload.
  • Does some things to try to work around bugs like what we're discussing, like configuring the browser to reload-on-focus (which is only relevant if it ignores the thing we're about to do ... which of course we do see in practice).
  • Concludes with window.location.reload(true);, aka asks the browser to reload the tab.

What we're seeing empirically here is that the window.location.reload step is not being processed by the browser, and additionally, the on-focus handler isn't being run when you click into the tab either. I will note that we discovered a bug a few months ago, fixed in 2d21836 / #22529, which would try to reload every time a focus every is triggered, not just once. So for next steps:

  • If this bug is NEW behavior starting in the last month, that could suggest that the change last month did introduce this issue for you.
  • The browser developer tools "Network" tab may have some useful information -- in particular, details like whether the browser attempted to make a request to the server when we requested the reload, or not.
  • If you can just use the Firefox profiler when doing nothing in Zulip for 10 seconds and post a screenshot of the result, that'd be helpful. The main thing we'd be looking for is whether the browser is idle in steady-state, or there's something like an animated emoji constantly burning CPU.
  • It's possible it'd be helpful for us if you started a thread in #issues in chat.zulip.org for this bug; there may be some things that benefit from an occasionally real-time debugging collaboration.
function do_reload_app(send_after_reload, save_pointer, save_narrow, save_compose, message_html) {                                     
    if (reload_state.is_in_progress()) {                                                                                               
        blueslip.log("do_reload_app: Doing nothing since reload_in_progress");                                                         
        return;                                                                                                                        
    }                                                                                                                                  
                                                                                                                                       
    // TODO: we should completely disable the UI here                                                                                  
    if (save_pointer || save_narrow || save_compose) {                                                                                 
        try {                                                                                                                          
            preserve_state(send_after_reload, save_pointer, save_narrow, save_compose);                                                
        } catch (error) {                                                                                                              
            blueslip.error("Failed to preserve state", undefined, error.stack);                                                        
        }                                                                                                                              
    }                                                                                                                                  
                                                                                                                                       
    // TODO: We need a better API for showing messages.                                                                                
    ui_report.message(message_html, $("#reloading-application"));                                                                      
    blueslip.log("Starting server requested page reload");                                                                             
    reload_state.set_state_to_in_progress();                                                                                           
                                                                                                                                       
    // Sometimes the window.location.reload that we attempt has no                                                                     
    // immediate effect (likely by browsers trying to save power by                                                                    
    // skipping requested reloads), which can leave the Zulip app in a                                                                 
    // broken state and cause lots of confusing tracebacks.  So, we                                                                    
    // set ourselves to try reloading a bit later, both periodically                                                                   
    // and when the user focuses the window.                                                                                           
    $(window).one("focus", () => {                                                                                                     
        blueslip.log("Retrying on-focus page reload");                                                                                 
        window.location.reload(true);                                                                                                  
    });                                                                                                                                
    setInterval(() => {                                                                                                                
        blueslip.log("Retrying page reload due to 30s timer");                                                                         
        window.location.reload(true);                                                                                                  
    }, 30000);                                                                                                                         
                                                                                                                                       
    try {                                                                                                                              
        server_events.cleanup_event_queue();                                                                                           
    } catch (error) {                                                                                                                  
        blueslip.error("Failed to clean up before reloading", undefined, error.stack);                                                 
    }                                                                                                                                  
                                                                                                                                       
    window.location.reload(true);                                                                                                      
}                                                                                                                                      

@carols10cents
Copy link
Author

carols10cents commented Sep 12, 2022

If this bug is NEW behavior starting in the last month, that could suggest that the change last month did introduce this issue for you.

No, I've had this issue for as long as I've used Zulip, since January 9, 2020.

The browser developer tools "Network" tab may have some useful information -- in particular, details like whether the browser attempted to make a request to the server when we requested the reload, or not.

I now have the network tab open and persisting logs and I will try to catch this bug happening.

If you can just use the Firefox profiler when doing nothing in Zulip for 10 seconds and post a screenshot of the result, that'd be helpful. The main thing we'd be looking for is whether the browser is idle in steady-state, or there's something like an animated emoji constantly burning CPU.

I've uploaded the profile to profiler.firefox.com, and here's a screenshot as well:

Screen Shot 2022-09-12 at 10 00 07 AM

It's possible it'd be helpful for us if you started a thread in #issues in chat.zulip.org for this bug; there may be some things that benefit from an occasionally real-time debugging collaboration.

I will go do that right now! EDIT: And done! https://chat.zulip.org/#narrow/stream/9-issues/topic/Pinned.20browser.20tab.20reload.20failing

@carols10cents
Copy link
Author

Ok, I captured an instance of this happening on chat.zulip.org and I had Firefox's Network tab open, so I exported it as .har. Only problem was I had it and the tab open for about 24 hours before the problem... I was going to try to pare down the logs, but it's in JSON, and I don't have much time right now to figure out the right jq filter to get the relevant info out, so.... here you go. It's a 383MB .tar file, expands to a 715MB .har file.

The different thing about this failed reload is that the loading images did load, as opposed to the screenshot of the beginning of this issue where the images were broken.

The relevant requests will be near the end of the file.. I did perform a successful manual reload after the failed reload.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: misc difficult Issues which we expect to be quite difficult needs reproducer priority: high rust community request Issues of interest to the Rust community
Projects
None yet
Development

No branches or pull requests

7 participants