Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] py docstrings: better conversion of C++ types to python #2581

Merged

Conversation

pmolodo
Copy link
Contributor

@pmolodo pmolodo commented Aug 8, 2023

Description of Change(s)

Numerous fixes to translation of C++ types to python types

NOTE:
This PR is one of several targeting the python docstring generation process. To see all these PRs in a branch, see here.

  • I have verified that all unit tests pass with the proposed changes
  • I have submitted a signed Contributor License Agreement

@jesschimein
Copy link

Filed as internal issue #USD-8575


# we use the reversed sorted order of the module names, because this ensures that longer names (ie, UsdGeom) come
# before shorter ones (Usd), so the more exact matches are preferred
PXR_MODULES_OR_JOINED = "|".join(re.escape(m) for m in reversed(pxrModules))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmolodo in verifying this change, I'm running into an issue where reversed() isn't supported for dictionaries in Python 3.6 & 3.7 (but is supported in 3.8+). Additionally, it sounds like dictionaries didn't really support strict ordering at all until Python 3.6 (see https://groups.google.com/g/dev-python/c/76kIoAEKlew).

Would there be a way to make this work for earlier versions of Python 3?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, it'd be nice to have some sort of "requires Python 3.8 or better" check or error message. The current error you get on 3.6 is something along the lines of "Error: 'dict' object is not reversible"

@pmolodo pmolodo force-pushed the pr/improve-cpp-to-python-typenames branch from 8e19c6b to f369155 Compare October 17, 2023 21:00
@pmolodo
Copy link
Contributor Author

pmolodo commented Oct 17, 2023

rebased to fix merge conflict...

@pmolodo
Copy link
Contributor Author

pmolodo commented Oct 17, 2023

Pushed changes to make it not rely on dicts being ordered or reversable:

f39826a

@pmolodo pmolodo force-pushed the pr/improve-cpp-to-python-typenames branch from f39826a to e44fe6b Compare October 17, 2023 21:55
@dsyu-pixar
Copy link

Hi @pmolodo thanks for the update! Unfortunately, I now get the following issue (when running with Python 3.6 -- this does not occur when testing with Python 3.9):

  File "/home/dsyu/trees/dev-usd/docs/python/doxygenlib/cdWriterDocstring.py", line 552, in __convertTypeName
    tokens = [self.__convertTypeNameToken(x) for x in WORDBREAK_RE.split(ret) if x]
ValueError: split() requires a non-empty pattern match.

I think this might be due to an re.split() issue in 3.6, possibly fixed in 3.7 (see https://bugs.python.org/issue43222).

I realize trying to craft things to work in older versions of Python 3 (that have known issues) can turn into an exercise in futility, so I'm fine with making a min requirement of Python 3.8 or 3.9 -- I believe the VFX 2022 reference platform recommends Python 3.9, and we're trying to test/build with that in mind anyway. What do you think? Maybe just adding a python_version() check somewhere early in convertDoxygen.py would be enough to be transparent enough to anyone using older versions?

@pmolodo
Copy link
Contributor Author

pmolodo commented Nov 16, 2023

I pushed a version which fixes that re.split issue on python-3.6. I tried to test with python-3.6 myself, but ran into another python-3.6 bug related to locale.getpreferredencoding() on the docker image we use for builds (that occurs earlier in the build process), so I can't confirm this is the last python-3.6 related issue.

If another one pops up, I say we just state that you need python-3.7+ to run this script...

@pmolodo
Copy link
Contributor Author

pmolodo commented Nov 16, 2023

Also, fyi - I confirmed that this regexp should give the same results once split with this script:

import inspect
import os
import re

WORDBREAK_RE = re.compile(r"\b")
NONWORD_RE = re.compile(r"(\W+)")

THIS_FILE = os.path.abspath(inspect.getsourcefile(lambda: None) or __file__)
THIS_DIR = os.path.dirname(THIS_FILE)

prim_cpp_path = os.path.join(THIS_DIR, "pxr", "usd", "usd", "prim.cpp")

with open(prim_cpp_path, "r", encoding="utf8") as reader:
    test_cpp_text = reader.read()

def tokenize_1(text: str):
    return [x for x in WORDBREAK_RE.split(text) if x]

def tokenize_2(text: str):
    return [x for x in NONWORD_RE.split(text) if x]

split1 = tokenize_1(test_cpp_text)
split2 = tokenize_2(test_cpp_text)

if split1 == split2:
    print("yay!")
else:
    raise Exception("oh noes")

@pixar-oss pixar-oss merged commit 28974f1 into PixarAnimationStudios:dev Nov 28, 2023
5 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants