Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Display loss recovery attempt for track-wlroots 0.18 branch #2458

Open
kode54 opened this issue Aug 29, 2024 · 11 comments
Open

Display loss recovery attempt for track-wlroots 0.18 branch #2458

kode54 opened this issue Aug 29, 2024 · 11 comments

Comments

@kode54
Copy link
Contributor

kode54 commented Aug 29, 2024

Here is my attempt at display loss recovery implementation for wlroots 0.18:

https://gist.github.com/kode54/58b9e30ed73f82e1cfb040fe84f36c66

It doesn't work so well.

Last attempt crashes with this backtrace:

#0  __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, 
    no_tid=no_tid@entry=0) at pthread_kill.c:44
44	     return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;

#0  __pthread_kill_implementation
    (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1  0x0000790b680a5463 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:78
#2  0x0000790b6804c120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x0000790b680334c3 in __GI_abort () at abort.c:79
#4  0x0000790b680333df in __assert_fail_base
    (fmt=0x790b681c3c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x5aacbd5146cd "handle || is_shutting_down()", file=file@entry=0x5aacbd5143de "../src/core/output-layout.cpp", line=line@entry=1709, function=function@entry=0x5aacbd517ae0 "wf::output_t* wf::output_layout_t::impl::get_output_coords_at(const wf::pointf_t&, wf::pointf_t&)") at assert.c:94
#5  0x0000790b68044177 in __assert_fail
    (assertion=0x5aacbd5146cd "handle || is_shutting_down()", file=0x5aacbd5143de "../src/core/output-layout.cpp", line=1709, function=0x5aacbd517ae0 "wf::output_t* wf::output_layout_t::impl::get_output_coords_at(const wf::pointf_t&, wf::pointf_t&)") at assert.c:103
#6  0x00005aacbd45f7a8 in wf::output_layout_t::impl::get_output_coords_at(wf::pointf_t const&, wf::pointf_t&) [clone .part.0] [clone .lto_priv.0]
    (closest=<optimized out>, origin=<optimized out>, this=<optimized out>)
    at ../src/core/output-layout.cpp:1709
#7  0x00005aacbd4782c0 in wf::output_layout_t::impl::get_output_coords_at
    (origin=<synthetic pointer>..., this=0x5aace2698960, closest=...) at ../src/core/core.cpp:297
#8  wf::output_layout_t::get_output_coords_at (this=<optimized out>, origin=..., closest=...)
    at ../src/core/output-layout.cpp:1762
#9  wf::compositor_core_impl_t::reconfigure_outputs (this=0x5aace0cd5430)
    at ../src/core/core.cpp:239
#10 0x00005aacbd513796 in main::{lambda(void*)#1}::operator()(void*) const [clone .isra.0]
    (__closure=0x5aace18aca20, data=<optimized out>) at ../src/main.cpp:458
#11 0x00005aacbd442c82 in std::function<void(void*)>::operator()
    (this=<optimized out>, __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#12 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#13 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#14 0x0000790b68a0342e in wl_signal_emit_mutable
    (signal=signal@entry=0x5aace0fc13b8, data=data@entry=0x0)
    at ../wayland-1.23.0/src/wayland-server.c:2314
#15 0x0000790b68913b5f in begin_gles2_buffer_pass
    (buffer=0x5aace1e82560, prev_ctx=0x7ffc170a37a0, timer=0x0)
    at ../wlroots-hidpi-xprop/render/gles2/pass.c:258
#16 gles2_begin_buffer_pass
    (wlr_renderer=<optimized out>, wlr_buffer=0x5aace1e4eb30, options=<optimized out>)
    at ../wlroots-hidpi-xprop/render/gles2/renderer.c:262
#17 0x0000790b6890ce35 in wlr_renderer_begin_buffer_pass
    (renderer=<optimized out>, buffer=<optimized out>, options=<optimized out>)
    at ../wlroots-hidpi-xprop/render/wlr_renderer.c:304
#18 0x00005aacbd4f71da in wf::swapchain_damage_manager_t::start_frame (this=0x5aace1745de0)
    at ../src/output/render-manager.cpp:331
#19 wf::render_manager::impl::paint (this=0x5aace1d6b1b0) at ../src/output/render-manager.cpp:1130
#20 0x00005aacbd442ce6 in std::function<void()>::operator() (this=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#21 handle_timeout (data=<optimized out>) at ../src/util.cpp:31
#22 0x0000790b68a053a6 in wl_timer_heap_dispatch (timers=0x5aace0cd5388)
    at ../wayland-1.23.0/src/event-loop.c:527
#23 wl_event_loop_dispatch (loop=0x5aace0cd5330, timeout=<optimized out>, timeout@entry=-1)
    at ../wayland-1.23.0/src/event-loop.c:1098
#24 0x0000790b68a0710f in wl_display_run (display=0x5aace0cd5240)
    at ../wayland-1.23.0/src/wayland-server.c:1530
#25 0x00005aacbd4410bb in main (argc=<optimized out>, argv=<optimized out>) at ../src/main.cpp:509

And then it drops to a terminal and fails to restart cage as my login manager, and hangs the GPU completely.

@ammen99
Copy link
Member

ammen99 commented Aug 29, 2024

If you're doing this, you need to make every plugin which has GL state (textures, framebuffers, programs) reload its state as well.

@kode54
Copy link
Contributor Author

kode54 commented Aug 29, 2024

May as well reload them all, then.

To provoke a reset, at least on amdgpu:

# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

@ammen99
Copy link
Member

ammen99 commented Sep 1, 2024

May as well reload them all, then.

To provoke a reset, at least on amdgpu:

# cat /sys/kernel/debug/dri/0/amdgpu_gpu_recover

Unloading a plugin might cause losing a lot of temporary state, which is not what we want in the ideal case .. Not to mention some plugins cannot be unloaded safely.

@soreau
Copy link
Member

soreau commented Sep 1, 2024

I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?

@ammen99
Copy link
Member

ammen99 commented Sep 1, 2024

I can't really think of an 'unloadable' plugin that also does GL stuff. Are there any?

I'd prefer to not make assumptions, maybe such a plugin will come in the future.

@soreau
Copy link
Member

soreau commented Sep 1, 2024

Sure, but I was thinking more along the lines of having no plugins loaded when testing, and if that works, then maybe hinge on unloadable flag for now until it works, then consider adding a new flag.

@kode54
Copy link
Contributor Author

kode54 commented Sep 2, 2024

Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.

@ammen99
Copy link
Member

ammen99 commented Sep 2, 2024

Maybe instead a notification should be plumbed to plugins that need it, to notify them that they need to free and reallocate their GPU resources? Would be better than forcing a full unload.

Yes that's the best solution.

@kode54
Copy link
Contributor Author

kode54 commented Sep 14, 2024

New attempt without any plugins that would have GL, new backtrace:

#0  __pthread_kill_implementation
    (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0)
    at pthread_kill.c:44
#1  0x00007da0c01b6463 in __pthread_kill_internal (threadid=<optimized out>, signo=6)
    at pthread_kill.c:78
#2  0x00007da0c015d120 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3  0x00007da0c01444c3 in __GI_abort () at abort.c:79
#4  0x00007da0c01443df in __assert_fail_base
    (fmt=0x7da0c02d4c20 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=file@entry=0x6362c73a291a "../src/core/opengl.cpp", line=line@entry=580, function=function@entry=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:94
#5  0x00007da0c0155177 in __assert_fail
    (assertion=0x6362c73a2c88 "wlr_texture_is_gles2(texture)", file=0x6362c73a291a "../src/core/opengl.cpp", line=580, function=0x6362c73a65c0 "wf::texture_t::texture_t(wlr_texture*, std::optional<wlr_fbox>)") at assert.c:103
#6  0x00006362c72fa240 in wf::texture_t::texture_t
    (this=this@entry=0x7fff1604d4a0, texture=0x6362f40c78f0, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
..., this=<optimized out>, texture=<optimized out>, viewport=Python Exception <class 'gdb.error'>: value has been optimized out
...) at ../src/core/opengl.cpp:580
#7  0x00006362c7378926 in wf::scene::wlr_surface_node_t::wlr_surface_render_instance_t::render (this=0x6362f41f62f0, target=..., region=...) at ../src/view/wlr-surface-node.cpp:317
#8  0x00006362c73841b3 in wf::scene::render_instance_t::render
    (this=<optimized out>, target=..., region=..., custom_data=std::any [no contained value]) at ../src/api/wayfire/scene-render.hpp:121
#9  wf::scene::run_render_pass (params=..., flags=flags@entry=3)
--Type <RET> for more, q to quit, c to continue without paging--c
    at ../src/output/render-manager.cpp:1227
#10 0x00006362c7385146 in wf::render_manager::impl::render_output (this=0x6362f39fb180)
    at ../src/output/render-manager.cpp:1092
#11 wf::render_manager::impl::paint (this=0x6362f39fb180)
    at ../src/output/render-manager.cpp:1141
#12 0x00006362c7386428 in wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}::operator()(void*) const (__closure=0x6362f39fb180) at ../src/output/render-manager.cpp:968
#13 std::__invoke_impl<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(std::__invoke_other, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&) (__f=...) at /usr/include/c++/14.2.1/bits/invoke.h:61
#14 std::__invoke_r<void, wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*>(wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}&, void*&&)
    (__fn=...) at /usr/include/c++/14.2.1/bits/invoke.h:111
#15 std::_Function_handler<void (void*), wf::render_manager::impl::impl(wf::output_t*)::{lambda(void*)#1}>::_M_invoke(std::_Any_data const&, void*&&)
    (__functor=..., __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:290
#16 0x00006362c72d0e82 in std::function<void(void*)>::operator()
    (this=<optimized out>, __args#0=<optimized out>)
    at /usr/include/c++/14.2.1/bits/std_function.h:591
#17 wf::wl_listener_wrapper::emit (this=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:57
#18 wf::handle_wrapped_listener (listener=<optimized out>, data=<optimized out>)
    at ../src/wl-listener-wrapper.tpp:10
#19 0x00007da0c0acc47e in wl_signal_emit_mutable
    (signal=<optimized out>, data=0x6362f3928f30)
    at ../wayland-1.23.1/src/wayland-server.c:2314
#20 0x00007da0c0acdefc in wl_event_loop_dispatch_idle (loop=loop@entry=0x6362f27e2330)
    at ../wayland-1.23.1/src/event-loop.c:970
#21 0x00007da0c0ace177 in wl_event_loop_dispatch
    (loop=0x6362f27e2330, timeout=<optimized out>, timeout@entry=-1)
    at ../wayland-1.23.1/src/event-loop.c:1110
#22 0x00007da0c0ad01f7 in wl_display_run (display=0x6362f27e2240)
    at ../wayland-1.23.1/src/wayland-server.c:1530
#23 0x00006362c72cf2db in main (argc=<optimized out>, argv=<optimized out>)
    at ../src/main.cpp:515

@ammen99
Copy link
Member

ammen99 commented Sep 14, 2024

Some wild guesses based on the stacktrace - Wayfire keeps a reference of the surface's texture/buffer:

this->current_buffer = &surface->buffer->base;

Depending on how wlroots has implemented GPU reset handling, maybe they change the texture/buffer pointer? So after the reset, we still hold on to the old texture until a new buffer is committed, but the old texture isn't valid anymore because of the gpu reset?

@DemiMarie
Copy link

Some wild guesses based on the stacktrace - Wayfire keeps a reference of the surface's texture/buffer:

this->current_buffer = &surface->buffer->base;

Depending on how wlroots has implemented GPU reset handling, maybe they change the texture/buffer pointer? So after the reset, we still hold on to the old texture until a new buffer is committed, but the old texture isn't valid anymore because of the gpu reset?

That sounds about right to me. After a GPU reset, old textures are useless, whether the pointers are valid or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants