[docs] py docstrings: better conversion of C++ types to python #2581

pmolodo · 2023-08-08T12:42:24Z

Description of Change(s)

Numerous fixes to translation of C++ types to python types

NOTE:
This PR is one of several targeting the python docstring generation process. To see all these PRs in a branch, see here.

I have verified that all unit tests pass with the proposed changes

I have submitted a signed Contributor License Agreement

jesschimein · 2023-08-10T00:55:33Z

Filed as internal issue #USD-8575

dsyu-pixar · 2023-09-28T22:10:52Z

docs/python/doxygenlib/cdWriterDocstring.py

+
+# we use the reversed sorted order of the module names, because this ensures that longer names (ie, UsdGeom) come
+# before shorter ones (Usd), so the more exact matches are preferred
+PXR_MODULES_OR_JOINED = "|".join(re.escape(m) for m in reversed(pxrModules))


@pmolodo in verifying this change, I'm running into an issue where reversed() isn't supported for dictionaries in Python 3.6 & 3.7 (but is supported in 3.8+). Additionally, it sounds like dictionaries didn't really support strict ordering at all until Python 3.6 (see https://groups.google.com/g/dev-python/c/76kIoAEKlew).

Would there be a way to make this work for earlier versions of Python 3?

Alternatively, it'd be nice to have some sort of "requires Python 3.8 or better" check or error message. The current error you get on 3.6 is something along the lines of "Error: 'dict' object is not reversible"

pmolodo · 2023-10-17T21:01:04Z

rebased to fix merge conflict...

pmolodo · 2023-10-17T21:51:45Z

Pushed changes to make it not rely on dicts being ordered or reversable:

f39826a

…or py < 3.6)

dsyu-pixar · 2023-10-19T21:28:46Z

Hi @pmolodo thanks for the update! Unfortunately, I now get the following issue (when running with Python 3.6 -- this does not occur when testing with Python 3.9):

  File "/home/dsyu/trees/dev-usd/docs/python/doxygenlib/cdWriterDocstring.py", line 552, in __convertTypeName
    tokens = [self.__convertTypeNameToken(x) for x in WORDBREAK_RE.split(ret) if x]
ValueError: split() requires a non-empty pattern match.

I think this might be due to an re.split() issue in 3.6, possibly fixed in 3.7 (see https://bugs.python.org/issue43222).

I realize trying to craft things to work in older versions of Python 3 (that have known issues) can turn into an exercise in futility, so I'm fine with making a min requirement of Python 3.8 or 3.9 -- I believe the VFX 2022 reference platform recommends Python 3.9, and we're trying to test/build with that in mind anyway. What do you think? Maybe just adding a python_version() check somewhere early in convertDoxygen.py would be enough to be transparent enough to anyone using older versions?

https://bugs.python.org/issue43222

pmolodo · 2023-11-16T21:52:11Z

I pushed a version which fixes that re.split issue on python-3.6. I tried to test with python-3.6 myself, but ran into another python-3.6 bug related to locale.getpreferredencoding() on the docker image we use for builds (that occurs earlier in the build process), so I can't confirm this is the last python-3.6 related issue.

If another one pops up, I say we just state that you need python-3.7+ to run this script...

pmolodo · 2023-11-16T21:52:52Z

Also, fyi - I confirmed that this regexp should give the same results once split with this script:

import inspect
import os
import re

WORDBREAK_RE = re.compile(r"\b")
NONWORD_RE = re.compile(r"(\W+)")

THIS_FILE = os.path.abspath(inspect.getsourcefile(lambda: None) or __file__)
THIS_DIR = os.path.dirname(THIS_FILE)

prim_cpp_path = os.path.join(THIS_DIR, "pxr", "usd", "usd", "prim.cpp")

with open(prim_cpp_path, "r", encoding="utf8") as reader:
    test_cpp_text = reader.read()

def tokenize_1(text: str):
    return [x for x in WORDBREAK_RE.split(text) if x]

def tokenize_2(text: str):
    return [x for x in NONWORD_RE.split(text) if x]

split1 = tokenize_1(test_cpp_text)
split2 = tokenize_2(test_cpp_text)

if split1 == split2:
    print("yay!")
else:
    raise Exception("oh noes")

dsyu-pixar reviewed Sep 28, 2023

View reviewed changes

[docs] py docstrings: better conversion of C++ types to python

f369155

pmolodo force-pushed the pr/improve-cpp-to-python-typenames branch from 8e19c6b to f369155 Compare October 17, 2023 21:00

[docs] py docstrings - don't rely on sorted or reversable py dicts (f…

e44fe6b

…or py < 3.6)

pmolodo force-pushed the pr/improve-cpp-to-python-typenames branch from f39826a to e44fe6b Compare October 17, 2023 21:55

[docs] fix for python-2.6 bug with zero-with re.split:

c9721fc

https://bugs.python.org/issue43222

pixar-oss merged commit 28974f1 into PixarAnimationStudios:dev Nov 28, 2023
5 checks passed

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[docs] py docstrings: better conversion of C++ types to python #2581

[docs] py docstrings: better conversion of C++ types to python #2581

pmolodo commented Aug 8, 2023

jesschimein commented Aug 10, 2023

dsyu-pixar Sep 28, 2023

dsyu-pixar Oct 2, 2023

pmolodo commented Oct 17, 2023

pmolodo commented Oct 17, 2023

dsyu-pixar commented Oct 19, 2023

pmolodo commented Nov 16, 2023

pmolodo commented Nov 16, 2023

[docs] py docstrings: better conversion of C++ types to python #2581

[docs] py docstrings: better conversion of C++ types to python #2581

Conversation

pmolodo commented Aug 8, 2023

Description of Change(s)

jesschimein commented Aug 10, 2023

dsyu-pixar Sep 28, 2023

Choose a reason for hiding this comment

dsyu-pixar Oct 2, 2023

Choose a reason for hiding this comment

pmolodo commented Oct 17, 2023

pmolodo commented Oct 17, 2023

dsyu-pixar commented Oct 19, 2023

pmolodo commented Nov 16, 2023

pmolodo commented Nov 16, 2023