Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of 'com_record' as a subclassable Python type. #2437

Open
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

geppi
Copy link
Contributor

@geppi geppi commented Dec 13, 2024

A 'tp_new' method was added to the 'PyTypeObject' slots of 'com_record'. Replacement new is now used to create an instance of a 'com_record' type, i.e. in 'tp_new' as well as in the factory functions. The 'tp_dealloc' method explicitely calls the destructor before finally calling 'tp_free'.

Records returned from a call to a COM method do get the subclass type according to the GUID in the COM RecordInfo. If no subclass with that GUID is found, the generic 'com_record' base class is used.

The algorithm that retrieves the list of subclasses of 'com_record' only uses methods and data structures of the public Python API. This is important because in Python 3.12 the type of the 'tp_subclasses' slot of a 'PyTypeObject' was changed to 'void*' to indicate that for some types it does not hold a valid 'PyObject*'.

The 'PyRecord_Check' function was modified to test if the object is an instance of 'com_record' or a derived type.

The implementation does not break existing code.
It is not required to define 'com_record' subclasses, i.e. it is possible to work with the generic 'com_record' type as before using the factory function.

Resolves #2361

@geppi
Copy link
Contributor Author

geppi commented Dec 13, 2024

This PR addresses issue #2361 and will allow to create subclasses of com_record.

It does enable typing of interface methods that expect a particular structure as a parameter.

To define a particular subclass of com_record it is required to specify these class attributes:

  • TLBID : the UUID of the Type Library where the record is defined
  • MJVER : the major version number of the Type Library
  • MNVER : the minor version number of the Type Library
  • LCID : the LCID of the Type Library
  • GUID : the GUID of the record structure from the Type Library

As an example with the following Type Library IDL:

[
  uuid(E6F07342-C1F7-4E4E-B021-11BBD54B9F37),
  version(2.3),
  helpstring("Example Type Library"),
]
library ExampleLib
{
    // TLib : OLE Automation : {00020430-0000-0000-C000-000000000046}
    importlib("stdole2.tlb");

	
    typedef [uuid(9F2C4E2E-2C5C-4F39-9FDB-840A1E08B165)]
    struct tagT_ENTITY {
        long parent;
        BSTR name;
        BSTR description;
    } T_ENTITY;

	
    typedef [uuid(B1461DD4-8C86-4627-B444-3D833C980111)]
    struct tagT_LINE {            
        double startX;
        double startY;
        double startZ;
        double endX;
        double endY;
        double endZ;
    } T_LINE;

	....

    interface IExample : IDispatch {
	
	[id(0x00000001)]
        HRESULT AddLine(
                        [in, out] T_ENTITY* ent, 
                        [in, out] T_LINE* line, 
                        [out, retval] long* retval);
        [id(0x00000002)]
        HRESULT GetLine(
                        [in] long n, 
                        [in, out] T_ENTITY* ent, 
                        [in, out] T_LINE* line);
		....
	}
....
}

the record class definitions for T_ENTITY and T_LINE will take the following form:

import pythoncom

class SomeTlibRecordMeta(type):

	def __new__(cls, name, bases, namespace):
		namespace['TLBID'] = '{E6F07342-C1F7-4E4E-B021-11BBD54B9F37}'
		namespace['MJVER'] = 2
		namespace['MNVER'] = 3
		namespace['LCID'] = 0
		return type.__new__(cls, name, bases, namespace)


class T_ENTITY(pythoncom.com_record, metaclass=SomeTlibRecordMeta):

	GUID = '{9F2C4E2E-2C5C-4F39-9FDB-840A1E08B165}'
	__slots__ = tuple()
	parent: int
	name: str
	description: str


class T_LINE(pythoncom.com_record, metaclass=SomeTlibRecordMeta):
	
	GUID = '{B1461DD4-8C86-4627-B444-3D833C980111}'
	__slots__ = tuple()
	startX: float
	startY: float
	startZ: float
	endX: float
	endY: float
	endZ: float

The metaclass in the above code is only used to avoid retyping the TLBID, MJVER, MNVER and LCID of the Type Library in every class definition for records from the same Type Library.

The subclasses provide valuable information for type hints.
In the future it would be possible to extend makepy to automatically generate those type hints, like:

def AddLine(self, ent:T_ENTITY=defaultNamedNotOptArg, line:T_LINE=defaultNamedNotOptArg) -> tuple[int, T_ENTITY, T_LINE]:
	return self._ApplyTypes_(643, 1, (3, 0), ((16420, 3), (16420, 3)), 'AddLine', None, ent, line)

def GetLine(self, n:int=defaultNamedNotOptArg, ent:T_ENTITY=defaultNamedNotOptArg, line:T_LINE=defaultNamedNotOptArg) -> tuple[T_ENTITY, T_LINE]:
	return self._ApplyTypes_(645, 1, (24, 0), ((3, 1), (16420, 3), (16420, 3)), 'GetLine', None, n, ent, line)

With the above class definitions, new instances of a particular record type can simply be created with e.g. line = T_LINE() or as before this PR with:

from win32com.client import gencache, Record

app = gencache.EnsureDispatch('Example.Application')
line = Record("T_LINE", app)

This PR does not break any existing code because the com_record base type is always used as the fallback type, if no particular subclass with a matching COM record GUID has been defined.

On the other hand, COM records returned by a call to a COM interface method will receive the proper subclass type, if their RecordInfo does match the GUID of a com_record subclass.

A 'tp_new' method was added to the 'PyTypeObject' slots of 'com_record'.
Replacement new is now used to create an instance of a 'com_record' type,
i.e. in 'tp_new' as well as in the factory functions. The 'tp_dealloc'
method explicitely calls the destructor before finally calling 'tp_free'.

Records returned from a call to a COM method do get the subclass type
according to the GUID in the COM RecordInfo. If no subclass with that
GUID is found, the generic 'com_record' base class is used.

The algorithm that retrieves the list of subclasses of 'com_record'
only uses methods and data structures of the public Python API.
This is important because in Python 3.12 the type of the 'tp_subclasses'
slot of a 'PyTypeObject' was changed to 'void*' to indicate that for some
types it does not hold a valid 'PyObject*'.

The 'PyRecord_Check' function was modified to test if the object is an
instance of 'com_record' or a derived type.

The implementation does not break existing code.
It is not required to define 'com_record' subclasses, i.e. it is possible
to work with the generic 'com_record' type as before using the factory
function.
@geppi geppi force-pushed the record_subclasses branch from 0c591f8 to ebfaa62 Compare December 13, 2024 20:26
@geppi
Copy link
Contributor Author

geppi commented Jan 2, 2025

Happy New Year to everybody!

I'm wondering if this PR has a chance to get merged or if it is too special for my particular use case?
Please let me know your opinion before I go ahead to add a test case.

Copy link
Owner

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any good reason to not take something like this once it's in good shape. LEt me know if I missed the point anywhere above, and I agree we'd want good tests for this.

com/win32com/src/PyRecord.cpp Outdated Show resolved Hide resolved
if (S_OK != StringFromCLSID(structguid, &guidString))
return NULL;
// Obtain a copy of the subclasses list to iterate over.
list = PyObject_CallMethod((PyObject *)type, "__subclasses__", NULL);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I like this as IIUC it makes the world non-deterministic, in that how this works will depends on what else happens to have been imported. Another alternative might be a function which allows you to register the subclass by GUID, and store them in a global map of some sort? I guess you could argue that's still non-deterministic, but at least requires explicit calls rather than side-effects)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please excuse my ignorance but I'm afraid I don't understand why this does depend on side effects.
Maybe you can point me on it with the help of the following examples.

Let's start with the plain vanilla pythoncom.com_record class:

import pythoncom

TL_CLSID = '{E6F07342-C1F7-4E4E-B021-11BBD54B9F37}'
TL_MJVER = 2
TL_MNVER = 3
TL_LCID = 0

T_ENTITY_CLSID = '{9F2C4E2E-2C5C-4F39-9FDB-840A1E08B165}'
T_LINE_CLSID = '{B1461DD4-8C86-4627-B444-3D833C980111}'

print(pythoncom.com_record.__subclasses__())

[]

and create an instance of the T_LINE COM Record:

line_1 = pythoncom.GetRecordFromGuids(TL_CLSID, TL_MJVER, TL_MNVER, TL_LCID, T_LINE_CLSID)
print(line_1)
print(type(line_1))

com_struct(startX=0.0, startY=0.0, startZ=0.0, endX=0.0, endY=0.0, endZ=0.0)
<class 'com_record'>

If we now define the subclasses of pythoncom.com_record with the appropriate GUIDs for the T_ENTITY and T_LINE COM Records from the TypeLibrary:

class T_ENTITY(pythoncom.com_record):
    __slots__ = tuple()
    TLBID = TL_CLSID
    MJVER = TL_MJVER
    MNVER = TL_MNVER
    LCID = TL_LCID
    GUID = T_ENTITY_CLSID

    
class T_LINE(pythoncom.com_record):
    __slots__ = tuple()
    TLBID = TL_CLSID
    MJVER = TL_MJVER
    MNVER = TL_MNVER
    LCID = TL_LCID
    GUID = T_LINE_CLSID

print(pythoncom.com_record.__subclasses__())

[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>]

we can do:

line_2 = T_LINE()
print(line_2)
print(type(line_2))

com_struct(startX=0.0, startY=0.0, startZ=0.0, endX=0.0, endY=0.0, endZ=0.0)
<class 'main.T_LINE'>

The type of the COM Record instance line_1, which we had created before the subclasses were defined, of course didn't change:

print(type(line_1))

<class 'com_record'>

However, creating a new T_LINE COM Record instance with the factory function, does now also return an instance of the subclass.

line_3 = pythoncom.GetRecordFromGuids(TL_CLSID, TL_MJVER, TL_MNVER, TL_LCID, T_LINE_CLSID)
print(line_3)
print(type(line_3))

com_struct(startX=0.0, startY=0.0, startZ=0.0, endX=0.0, endY=0.0, endZ=0.0)
<class 'main.T_LINE'>

If we define an invalid COM Record subclass using a GUID that is not contained in the TypeLibrary:

T_BAD_CLSID = '{A50732E7-F546-498B-AF28-E58161F9CA5C}'

class T_BAD(pythoncom.com_record):
    __slots__ = tuple()
    TLBID = TL_CLSID
    MJVER = TL_MJVER
    MNVER = TL_MNVER
    LCID = TL_LCID
    GUID = T_BAD_CLSID

print(pythoncom.com_record.__subclasses__())

[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>, <class 'main.T_BAD'>]

the attempt to create an instance fails with an exception:

bad = T_BAD()

---------------------------------------------------------------------------
com_error Traceback (most recent call last)
----> bad = T_BAD()
com_error: (-2147319765, 'Element nicht gefunden.', None, None)

Of course you are free to define all kinds of dysfunctional subclasses:

class T_STUPID(pythoncom.com_record):
    description = 'blabla'

print(pythoncom.com_record.__subclasses__())

[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>, <class 'main.T_BAD'>, <class 'main.T_STUPID'>]

but you cannot instantiate them:

gaga = T_STUPID()

---------------------------------------------------------------------------
SystemError Traceback (most recent call last)
----> gaga = T_STUPID()
SystemError: <class 'main.T_STUPID'> returned NULL without setting an exception

So to define working subclasses of pythoncom.com_record you have to follow the recipe and provide the GUID of the TypeLibrary that containts the COM Record definition, as well as its major and minor version number, plus the LCID and last but not least the GUID of the COM Record type as class variables.

I would argue that the definition of such a subclass is a kind of global registration.

Please excuse if I completely missed the point with the above examples but hopefully they can help to identify where there are side effects or non-determinisms.

Copy link
Owner

@mhammond mhammond Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point is that these subclasses are presumably going to be in your code rather than in pywin32. Thus:

print(pythoncom.com_record.__subclasses__())

[<class 'main.T_ENTITY'>, <class 'main.T_LINE'>]

Assuming T_ENTITY/T_LINE etc are in modules you wrote, then the print above will only generate that output if those modules you wrote were actually imported. Thus, this new capability will implicitly depend on exactly what modules were previously imported and presumably what order they are in when multiple modules try and register the same type.

I would argue that the definition of such a subclass is a kind of global registration.

That's true, but it's implicit and hard to reason about. An explicit registration makes more sense to be because (a) it's easier to locate where these calls are and (b) there's an opportunity to fail if 2 callers try and register the same type. I'm saying that without (b), which type is used is largely non-deterministic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, understood.
Do you think it is necessary to implement the nuts and bolts of this registration in pythoncom or would it be sufficient to keep the implementation as is and just provide an 'official' class-factory function for pythoncom.com_record subclasses that does the registration and checks to prevent double registration?

I mean the whole class machinery of Python itself does not prevent from doing all kinds of weird things and in particular somehow related to your point (b):

class A():
    a = 1

class B(A):
    b = 2

class B(A):
    c = 3

print(A.__subclasses__())

[<class 'main.B'>, <class 'main.B'>]

instantiating B does return an instance of the class that was last defined with this name:

x = B()

print(x.b)

---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
x = B()
----> print(x.b)
AttributeError: 'B' object has no attribute 'b'

print(x.c)

3

The class registration function could easily be written in Python and use the machinery as implemented.
Of course users could then still exploit the machinery under the hood and invoke the implicit behavior.
However, if you're not using the 'official' method you should know what you do.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A Python implementation seems fine to me and I'm not bothered that people can arrange to avoid this - it's just the fact that a base class which mediates which subclass is instantiated just smells inverted to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's just the fact that a base class which mediates which subclass is instantiated just smells inverted to me.

The issue for which I couldn't come up with a better solution is how to instantiate the proper pythoncom.com_record subclass for Records returned by a COM method call. Without the 'smelly' part everything was fine as long as the Records were created by client code. However, all Records returned from a COM method call were just generic base class pythoncom.com_record instances. Therefore I introduced this 'identification' logic based on the GUID of the returned Record.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking get_registered_record_class or similar would hold classes, so the logic around here would be roughly the same - you are still making a call back into Python, but instead of subclasses you would be call the new function with some args. You would then expect the result to be a single class (or None) rather than iterating here - which you'd instantiate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the dictionary 'RecordClasses' to pythoncom.

The function 'PyRecord::new_record' does now check if the COM Record GUID is contained in this dict and uses the value returned for the GUID key as the subclass to instantiate.

The new Python function 'register_record_class' in the client module can be used to populate the dictionary with 'GUID' key / 'com_record subclass' value pairs. It checks that the value to be registered is a proper, instantiable com_record subclass and that the GUID is not already contained in the dictionary thus preventing accidential, potentially conflicting double registration.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While working on the tests, I recognized that it might be confusing that you could register a subclass with a name different from the name for that GUID in the TypeLibrary. In general it shouldn't matter what name you give to a struct because it's identity is based on the GUID. However, it might be confusing if you have to use a different name in 'win32com.client.Record' to create the struct and the returned instance is then of a type with the registered name.

There are pros and cons. On the one hand it could be helpful to give COM Record types a more meaningful name with respect to your application than the one from the TypeLibrary. On the other hand it might create inconsistencies when mixing subclass instance creation with the use of 'win32com.client.Record'.
Although one could argue that a user who defines and registers a COM Record type with a different name should be aware of the consequences.

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given this a second thought and would value consistency over flexibility.

Therefore I've added a check for the record type name in the 'register_record_class' Python function.
To enable this in a straight forward way I've also added another dunder attribute '__record_type_name__' to the 'PyRecord::getattro' function that retrieves the record type name via 'IRecordInfo::GetName'.

If someone would really like to give the subclass a different name from the name in the TypeLibrary there is still the possibility to bypass the registration function and enter the class directly into the 'pythoncom.RecordClasses' dictionary.
However, using the "official" way everything should be consistent now.

geppi added 3 commits January 3, 2025 14:32
from which a COM Record instance will be created now looks up the Record GUID
in a global map, to only create instances of subclasses that have explicitely
been registered in this map.
@geppi geppi force-pushed the record_subclasses branch from 8c0bb0d to bb40a88 Compare January 5, 2025 21:16
'pythoncom.com_record' to the global map 'pythoncom.RecordClasses'.
The function does only add instantiable subclasses of 'pythoncom.com_record'
to the map if their GUID is not already contained in the map.
@geppi geppi force-pushed the record_subclasses branch from bb40a88 to 0f95786 Compare January 5, 2025 21:27
Python function.

Also added another dunder attribute '__record_type_name__' to the
'PyRecord::getattro' function that retrieves the record type name
via 'IRecordInfo::GetName'.
@geppi geppi requested a review from mhammond January 7, 2025 16:57
Co-authored-by: Avasam <samuel.06@hotmail.com>
Copy link
Owner

@mhammond mhammond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great! It should be fairly easy to get a couple of basic tests for this in PyCOMTest?

com/win32com/client/__init__.py Outdated Show resolved Hide resolved
@@ -91,7 +94,7 @@ PyObject *PyObject_FromSAFEARRAYRecordInfo(SAFEARRAY *psa)
hr = info->RecordCopy(source_data, this_dest_data);
if (FAILED(hr))
goto exit;
PyTuple_SET_ITEM(ret_tuple, i, new PyRecord(info, this_dest_data, owner));
PyTuple_SET_ITEM(ret_tuple, i, PyRecord::new_record(info, this_dest_data, owner));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add a null check? It should have already been there for new, but there are many more error conditions now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. The question is where to put this null check?
There are other places where PyRecord::new_record is used without a null check.

Should we instead raise a Python exception in PyRecord::new_record itself in every place where it currently returns a null without raising an exception?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, it will only currently return null when OOM. In this patch we have a more complicated situation - sometimes when it returns null there will be an exception set, whereas OOM will not. Having those OOM cases call PyErr_NoMem() seems easy, and it looks to me like all of the callers of PyRecord::new_record will do the right thing if null/false is returned - but what the docs say will not be Ok is PyTuple_SET_ITEM.

Copy link
Contributor Author

@geppi geppi Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added commit c9aeec2 and commit 61662ab to resolve this.

com/win32com/client/__init__.py Outdated Show resolved Hide resolved
geppi added 5 commits January 8, 2025 10:16
that added a check for the record type name in the 'register_record_class'
Python function.
'PyRecord::getattro' function that retrieves the record
type GUID via 'IRecordInfo::GetGuid'.
'PyTuple_SET_ITEM' in the 'PyObject_FromSAFEARRAYRecordInfo' function.

Also enhanced the treatment of NULL return values in several other
places in 'PyRecord.cpp'.
@geppi geppi force-pushed the record_subclasses branch from ed5fe41 to cb675ab Compare January 9, 2025 18:56
@geppi
Copy link
Contributor Author

geppi commented Jan 9, 2025

Added my first shot on the tests to PyCOMTest.

@geppi geppi requested review from mhammond and Avasam January 9, 2025 19:14
@geppi geppi force-pushed the record_subclasses branch from 06105f2 to 0cef441 Compare January 12, 2025 13:46
'PyTuple_SET_ITEM' in the 'PyRecord::getattro' function.

Also 'PyRecord::new_record' will now raise a 'PyErr_NoMemory'
exception when the call to 'tp_alloc' returns NULL.
@geppi geppi force-pushed the record_subclasses branch from 0cef441 to 61662ab Compare January 12, 2025 17:51
@mhammond
Copy link
Owner

This is looking good to me - would you like to add a changelog entry (if not I'll make sure one gets added)?

@geppi
Copy link
Contributor Author

geppi commented Jan 13, 2025

Thank you Mark, I'll add one.

In the meantime I spent some time to think about how those subclasses could be initialized and discovered the following.

Initially I thought it would be a good idea to provide the constructor functionality also for the base class com_record in addition to the Record factory function. The result is that the constructor of the base class requires more parameters than the one of the subclasses which becomes a problem for implementing a method for the tp_init slot.

Moreover it is a little inconsistent that with the current implementation, the com_record constructor returns either an instance of the com_record base class or an instance of the matching subclass, in case the latter was registered.

Therefore I would like to propose, to keep the base class com_record only instantiable via the Record factory function and raise an exception if the constructor is called. In addition I would also raise an exception when the constructor of a subclass is used but this subclass has not been registered. Currently the constructor would return an instance of the base class com_record which is virtually the inverse of the behavior of com_record.

I think that this would implement a more deterministic behavior.

I'll add the proposed changes to the discussion of this PR shortly.

instance initialization.

Also changed the com_record base class to be only instantiable via
the Record factory function and subclasses of com_record only if
they are registered.

Modified the test in 'testPyComTest.py' to reflect this.

Added a test for the initialization of com_record subclasses.
@geppi geppi force-pushed the record_subclasses branch from 44fdad2 to aed5d91 Compare January 13, 2025 20:20
@geppi
Copy link
Contributor Author

geppi commented Jan 13, 2025

Added the initialization functionality for com_record subclasses and added a test.

@geppi
Copy link
Contributor Author

geppi commented Jan 14, 2025

I'm wondering if this functionality and what's required to successfully register com_record subclasses should be further documented?

I could write a short explanation but I'm not familiar what would be required to included it in the documentation.

@geppi
Copy link
Contributor Author

geppi commented Jan 17, 2025

Therefore I would like to propose, to keep the base class com_record only instantiable via the Record factory function and raise an exception if the constructor is called. In addition I would also raise an exception when the constructor of a subclass is used but this subclass has not been registered. Currently the constructor would return an instance of the base class com_record which is virtually the inverse of the behavior of com_record.

@mhammond Do you agree with these changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Enable creating subclasses of com_record
3 participants