Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash using Ref<T> as parameter #1045

Merged
merged 1 commit into from
Apr 4, 2023

Conversation

zhehangd
Copy link
Contributor

@zhehangd zhehangd commented Feb 16, 2023

Fixes #1021 and fixes #1024 which are exactly the same issue.

In GDScript VM we have

const void **argptrs = call_args_ptr;
for (int i = 0; i < argc; i++) {
  GET_INSTRUCTION_ARG(v, i); 
  argptrs[i] = VariantInternal::get_opaque_pointer((const Variant *)v);
} 
GET_INSTRUCTION_ARG(ret, argc + 1);
VariantInternal::initialize(ret, Variant::m_type); 
void *ret_opaque = VariantInternal::OP_GET_##m_type(ret);
method->ptrcall(base_obj, argptrs, ret_opaque);

It extracts the underlying data pointers from parameter variants -- , Object* for objects -- and call ptrcall.

In our GDExtension code, we may declare a method like this

void do_something(Ref<Foo> foo) const;

Godot-cpp will generate the code that binds it with GDExtension API.
For example, it could use (in binder_common.hpp)

template <class T, class... P, size_t... Is>
void call_with_ptr_argsc_helper(T *p_instance, void (T::*p_method)(P...) const, const GDExtensionConstTypePtr *p_args, IndexSequence<Is...>) {
	(p_instance->*p_method)(PtrToArg<P>::convert(p_args[Is])...);
}

And for Ref<Foo> it calls

_FORCE_INLINE_ static Ref<T> convert(const void *p_ptr) {

_FORCE_INLINE_ static Ref<T> convert(const void *p_ptr) {
  return Ref<T>(reinterpret_cast<T *>(godot::internal::gde_interface->object_get_instance_binding(*reinterpret_cast<GDExtensionObjectPtr *>(const_cast<void *>(p_ptr)), godot::internal::token, &T::___binding_callbacks)));
}

Now here is the problem.
The engine-provided p_ptr is Foo*, while godot-cpp assumes it is Ref<Foo>* and tries to get Foo* from Ref<Foo>::ptr(), which naturally causes a crash.

This issue comes from the attempt to fix Ref return issue. See #958.
It turns out that only encode needs to be changed to set Ref on the engine side, convert should remains the same working with raw object pointers.

Hilariously, the previous implementation of convert before #958 was also wrong. It treated p_ptr as Object**, same as the issue in #1044.

@zhehangd zhehangd requested a review from a team as a code owner February 16, 2023 06:51
@zhehangd zhehangd changed the title Fix PtrToArg<Ref<T>> crash Fix crash using Ref<T> as parameter Feb 16, 2023
@akien-mga akien-mga added bug This has been identified as a bug crash labels Feb 16, 2023
@mbrlabs
Copy link

mbrlabs commented Mar 4, 2023

I tested this locally and it fixes the issue on my end (well at least for Ref<Image>).

Hope this gets merged in time for the next maintenance release. GDExtension is basically unusable like this for non trivial tasks.

@hakuhan
Copy link

hakuhan commented Apr 4, 2023

@akien-mga Can you please also review this PR? It is simular with "Object*" PR which was approved.

@akien-mga akien-mga merged commit feaba55 into godotengine:master Apr 4, 2023
@akien-mga
Copy link
Member

Thanks!

@zhehangd
Copy link
Contributor Author

zhehangd commented Apr 4, 2023

Cherry-pick to 4.0 I suppose?

@saki7
Copy link
Contributor

saki7 commented Apr 20, 2023

This PR breaks the current master for overriding const Ref<T>& virtual methods.

For example, the code below crashes immediately when any event is received.

class MyContainer : public PanelContainer
{
    GDCLASS(MyContainer , PanelContainer);

public:
    static void _bind_methods() {}
    void _input(const Ref<InputEvent>& event) override;
};

I have confirmed that it won't crash if I revert L269 to the original code

return Ref<T>(reinterpret_cast<T *>(godot::internal::gde_interface->object_get_instance_binding(*reinterpret_cast<GDExtensionObjectPtr *>(const_cast<void *>(p_ptr)), godot::internal::token, &T::___binding_callbacks)));

@zhehangd @akien-mga Could you, please double-check this PR?

@zhehangd
Copy link
Contributor Author

Did a quick test and confirmed that the crash is real.

void call_with_arg_by_value(godot::Ref<godot::Image> image); // ok
void call_with_arg_by_ref(const godot::Ref<godot::Image>& image); // ok
void _input(const godot::Ref<godot::InputEvent>& event) override; // crash

I initially thought it was a problem between passing Ref by value and by reference but it does not seem to be the case.
It is related to virtual methods.
Sadly I don't have experience with Godot's virtual method in godot-cpp (I don't even know how to define a custom one), but I will take a look in the next few days.

I can think of two hypotheses:

  • Either godot or godot-cpp treats non-virtual/virtual methods differently and gives one Obj* and the other Ref*.
  • The lifetime of the newly constructed Ref ended before the calls somehow for virtual methods.

Let me know if there are other clues.

@saki7
Copy link
Contributor

saki7 commented Apr 20, 2023

The former version of core also had same sort of pointer indirection bug #677 and that was fixed by me.

I'm eager to learn the rationale of current master (what's going on inside the core?) so that we would hopefully never going to need to fix this every year...

@Faless
Copy link
Contributor

Faless commented Apr 20, 2023

  • Either godot or godot-cpp treats non-virtual/virtual methods differently and gives one Obj* and the other Ref*.

It does, as it was discussed in #958 (which this PR reverted) and godotengine/godot#69902 .
GDExtension virtuals, and regular methods use different encoding.
Virtuals passes Refs via pointer, which needs to be retrieved via the ref_get_object (or set with ref_set_object).

Not sure what the proper solution is, but as a note to extensions developers, if you are using this virtual methods which take or return refs (e.g. WebRTC, ImageLoaders, etc) you should revert this PR (and probably avoid using Refs in new extension-exposed methods).

@saki7
Copy link
Contributor

saki7 commented Apr 20, 2023

@Faless said:
GDExtension virtuals, and regular methods use different encoding.
Virtuals passes Refs via pointer, which needs to be retrieved via the ref_get_object (or set with ref_set_object).

@BastiaanOlij Why were the encodings diverged in the first place? Is there any technical reasons why virtual methods require Ref arguments to be encoded by pointer? I'm afraid that I'm not fully convinced by the description in godotengine/godot#69902. What is going on with the ownership of RefCounted when we mix virtual methods and const Ref<T>&?

PtrToArg is only aware of parameter types (it doesn't know about caller/callee) so it seems impossible to fix this issue by just altering PtrToArg::convert(). As I mentioned above, if we really need two encodings, we must introduce some extra layer on binder_common, so that we could detect the encoding at the outer scope.

@dsnopek
Copy link
Collaborator

dsnopek commented Apr 20, 2023

It does, as it was discussed in #958 (which this PR reverted) and godotengine/godot#69902.

I'll add this to the agenda for the next GDExtension meeting, but I think we should add a CI step to godot-cpp that builds and runs a test GDExtension where we try to excercise some of these tricky cases. It's hard to keep track of all the past issues and avoid reverting/regressing old fixes

@BastiaanOlij
Copy link
Collaborator

@saki7 thats a bit of a story :)

The root problem is that the Ref<T> class in Godot is not exposed, this is not possible either as we're dealing with a C based API. Thus the contained object is the object being passed between Godot and the extension, but on the extension side the pointer is now encapsulated in a wrapper object. The extension thus needs to implement its own Ref<T> approach but now for the wrapper object. Both Godot and extension implementation of Ref<T> will call the refcounting functions on T to increase or decrease its refcount.

Because we're passing pointers to the godot object we have a possibility that the Ref<T> on the source side goes out of scope before the destination has increased the refcount, leading to the object being destroyed and Godot crashing.

So there are a bunch of scenarios where we've got a Ref<T> object and passing a T*, and not all are implemented equally. Also most of this implementation uses macros and C++ type casting sometimes making it really hard to predict what the code will be, and opening up a can of worms if the same type casting is used in 2 places, that both need different ways of dealing with Ref<T>, so we end up fixing one, and breaking the other, and pingponging between the two.

When we look at calls that go through our script system, so the same system that GDScript uses, we actually use Variants as intermediates. All parameters are converted to Variants and placed in an arguments array, and then the call is made. Variants respect refcounting and thus the issue described above is solved. This approach is rock solid.

However Variants have limitations on the types they can hold and have a bunch of overhead in encapsulating values. Not a problem for GDScript as we're working solely with Variants there so every value is already encapsulated, but overhead for our GDExtensions.

When virtual function support was introduced we wanted to support any C types that are used in many cases, so instead of using Variants we used normal C types. That works for everything except Ref<T> because a pointer to Ref<T> is meaningless to the extension. Hence needing to jump through hoops in handling Ref<T> on virtual functions.

@saki7
Copy link
Contributor

saki7 commented Apr 20, 2023

Thanks for the explanation :)
It is my understanding that essential part of conversions can be described as pseudo-code like below:

GDExtension side The value on core How godot encodes it How godot-cpp decodes it
void func(Object* obj); Object* obj; ptr = obj;
return (void*)ptr;
obj = (Object*)ptr;
func(obj);
void func(Ref<T> ref); Ref<T> ref; ptr = ref.ptr();
return (void*)ptr;
obj = (T*)ptr;
func(Ref<T>(obj));
void func(const Ref<T>& ref); Ref<T> ref; ptr = ref.ptr();
return (void*)ptr;
obj = (T*)ptr;
func(Ref<T>(obj));
void _func(const Ref<T>& ref) override; Ref<T> ref; ptr = ref.ptr();
return (void**)&ptr;
pobj = (T**)ptr;
_func(Ref<T>(*pobj));

Note row 3 and row 4, in these cases GDExtension expects same parameter types, i.e. const Ref<T>&, but the conversion must be done in two different ways. However PtrToArg<const Ref<T>&>::convert(...) has no information whether the callee is virtual or not, so call_with_ptr_argsc_helper and its relatives must detect the actual type before dispatching pointers to PtrToArg. Currently there's no such detection implemented, so we should introduce a new code for that.

@BastiaanOlij Sorry to bother you, but could you clarify the proper solution for this problem?

@BastiaanOlij
Copy link
Collaborator

BastiaanOlij commented Apr 24, 2023

@saki7, Couple of things that do not seem right though you are close to it.

  1. this table only applies when calling virtual functions on the godot-cpp side from Godot which was added at a later time. Normal calls are handled through variants and don't have this issue. If there is a 3rd scenario I'm unaware off than we should discuss that further.

  2. When Ref is used it would be more accurate to state the code is:

obj = (T*)obtain_wrapper_object_for<T>(ptr)
func(Ref<T>(obj));

The pointer to an object on the Godot side is meaningless in godot-cpp, there is some magic happening that checks if we already have created a wrapper object on the godot-cpp side and it will use that, else it will create a wrapper object.

  1. your 4th scenario is wrong, this only happens when we have a Ref return value which is why we have a ** so we can set the value:
GDExtension side The value on core how godot decodes it How godot-cpp-encodes it
Ref<T> _func() override; Ref<T> ref; (void **)ptr = nullptr;
_func(ptr);
ref = Ref(ptr)
Ref<T> ref = _func();
*ptr = ref->_owner

(note that when you have return values, we actually sent that as a parameter to godot-cpp, I think it's added at the end but it could also be the first argument, I'm not sure)

@saki7
Copy link
Contributor

saki7 commented Apr 24, 2023

@BastiaanOlij Thanks again for the description 👍
@zhehangd I think we have a better insight now, can we discuss about how to fix this PR?

@saki7
Copy link
Contributor

saki7 commented Apr 24, 2023

@dsnopek

I'll add this to the agenda for the next GDExtension meeting, but I think we should add a CI step to godot-cpp that builds and runs a test GDExtension where we try to excercise some of these tricky cases.

For this specific issue, I propose a test matrix like this:

  • Callee is non-virtual / virtual
  • Parameter is Object* / Ref<T> / Scalar
  • Parameter qualifier is Value / const Value&
  • Return type is void / Object* / Ref<T> / Scalar

Total: 2 * 3 * 2 * 4 = 48 cases

@saki7
Copy link
Contributor

saki7 commented Apr 24, 2023

We discussed this issue on GDExtension meeting today.

Impacts

Possible solutions

  • @saki7 and many other people, even the contributors are still not sure how the patch can be implemented correctly.
  • The attendees have agreed that we really need an extensive support from @BastiaanOlij, since Bastiaan seems to be the best person who knows the current implementation details. So we might perhaps kindly going to ask for your support on Godot Contributors Chat.
  • @Faless suggested that the binding generator might be capable of generating the methods differently depending on the types.
  • Some people also suggested that a possible fix for this issue might involve an another breaking change for the core.

Documentation

  • The attendees have agreed that the current encoding specification must be documented somewhere, so that we could reference the information about how the core encodes the objects as of the time it has been documented.

Testing

  • The attendees have agreed that we should implement general tests; for godot-cpp the initial attempt was submitted here: Add proper test #922
  • @dsnopek is trying to figure out the CI solution for tracking these issues.
  • @saki7 Suggested that the unit tests should not be kept inside godot-cpp but instead we must apply those tests globally to core and other language bindings, too.
    • However, due to the lack of time, we could not reach a simple conclusion of how the tests can be implemented globally.

@BastiaanOlij
Copy link
Collaborator

We should probably move this discussion into a separate issue instead of keeping this going on a merged PR :)

@dsnopek
Copy link
Collaborator

dsnopek commented Apr 27, 2023

Perhaps this PR should be reverted while we come up with a better solution?

@saki7
Copy link
Contributor

saki7 commented Apr 28, 2023

Yes I think this PR should be reverted for now.

There's much more use-cases for overriding virtual methods. And if we can't override virtual methods then there's technically no workaround.

@zhehangd
Copy link
Contributor Author

I have been temporarily away from Godot to focus on some other stuff, and it isn't easy to closely catch up. Sorry for that.
I wonder if the old workaround #801 for the reference counting issue (the origin of all the subsequent troubles we have now) still works as a temporary solution.
Probably not directly I suppose since now there are two interpretations of the parameter of PtrToArg::convert(const void*) (T* and Ref<T>*) and it has to be unified first.

@zhehangd
Copy link
Contributor Author

Btw, if you guys decide to revert, remember that in the previous version PtrToArg<Ref<T>>::convert and PtrToArg<const Ref<T> &>::convert are also inconsistent. One interprets p_ptr as Ref<T>* and the other T*.
I am unsure what problems it may cause in practice.

@MarianoGnu
Copy link

@akien-mga i could reproduce this crash with both 4.0.1 and 4.0.2 tags, and in 4.0 branch (which right now matches 4.0.2 tag)
As a workaround for this crash i am temporarly using godot-cpp master with my godot 4.0.1 custom build until i can update my project, but i believe this should be cherry picked on 4.0 stable branch

@saki7
Copy link
Contributor

saki7 commented May 3, 2023

No, please don't cherry-pick.

As we mentioned above, this PR contains regression. I really appreciate the effort taken by the PR author, however it had been confirmed by the contributors that the root cause of this issue is pretty much non-trivial. No offense, but I suggest you patch your own repo until this is completely fixed.

dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 23, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 25, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 25, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 25, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 25, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 25, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 25, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 26, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 26, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request May 26, 2023
dsnopek added a commit to dsnopek/godot-cpp that referenced this pull request Jun 7, 2023
…standardize on `Object **` encoding in ptrcall
akien-mga added a commit that referenced this pull request Jun 7, 2023
Revert the changes from PR #1044 and #1045 and standardize on `Object **` encoding in ptrcall
@zhehangd zhehangd deleted the fix_ref_crash branch October 7, 2024 13:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This has been identified as a bug crash
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash using Ref as parameter Crash when passing Image to a function
9 participants