Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[mypyc] Merge more primitive ops #9110

Merged
merged 8 commits into from
Jul 9, 2020
Merged

[mypyc] Merge more primitive ops #9110

merged 8 commits into from
Jul 9, 2020

Conversation

TH3CHARLie
Copy link
Collaborator

@TH3CHARLie TH3CHARLie commented Jul 8, 2020

relates mypyc/mypyc#734. This PR completes ALL ops of dict, str, list, tuple, set that are supported using current design. The remaining ones would rather need to split into multiple ops(via specializers) or using pointers.

@TH3CHARLie
Copy link
Collaborator Author

oops, something broke mypyc. Had to do a binary search to find out....

@TH3CHARLie
Copy link
Collaborator Author

TH3CHARLie commented Jul 8, 2020

Turn out this is much more complicated than I assumed...

Here's what I find:

using mypyc to compile mypy itself, the first assertion fail would happen during codegen the function: SymbolTable.__str__ in mypy/nodes.py:

    def __str__(self) -> str:
        a = []  # type: List[str]
        for key, value in self.items():
            # Filter out the implicit import of builtins.
            if isinstance(value, SymbolTableNode):
                if (value.fullname != 'builtins' and
                        (value.fullname or '').split('.')[-1] not in
                        implicit_module_attrs):
                    a.append('  ' + str(key) + ' : ' + str(value))
            else:
                a.append('  <invalid item>')
        a = sorted(a)
        a.insert(0, 'SymbolTable(')
        a[-1] += ')'
        return '\n'.join(a)

The when building the CallC for the a.append(' <invalid item>') method call, it will have a result_type of object_rprimitive if you checked the generated IR, so it will coerce the CallC(with c_int return type) into object type and generate box op following the CallC in IR. However, if you write python code like this(it's basically mimicking the above code, simpler examples such as the ones in IR tests would work fine, as we may expected):

from typing import Dict, List
class Node:
    def __init__(self, x: int):
        self.x = x

class Table(Dict[str, Node]):
    def __str__(self) -> str:
        a = []  # type: List[str]
        for key, value in self.items():
            if value.x > 10:
                a.append('greater than 10')
            else:
                a.append('invalid item')
        a = sorted(a)
        a.insert(0, 'DummyNode')
        return '\n'.join(a)

There would be a result_type of none_rprimitive and therefore no box and everything works as expected.

So what causes such differences? It seems that the builder.node_type fail to find the method call's type in builder's types in the first case, therefore leading to an object_rprimitive as the expression result type. However, according to python's semantics, foo.append(x) should return none.

return builder.gen_method_call(obj,
callee.name,
args,
builder.node_type(expr),
expr.line,
expr.arg_kinds,
expr.arg_names)

the builder's types come from the mypy check, I am not sure why these two code snippets have different types on this expression. @JukkaL

Updates:
I tried the same config from mypy_bootstrap on the mockup example and it still works fine.

Updates 2:
I used the master branch version mypyc to compile mypy and checked the SymbolTable.__str__ function's generated C:

CPyL33: ;
    cpy_r_r46 = CPyStatic_unicode_6250; /* '  <invalid item>' */
    cpy_r_r47 = PyList_Append(cpy_r_a, cpy_r_r46) >= 0;
    if (unlikely(!cpy_r_r47)) {
        CPy_AddTraceback("mypy/nodes.py", "__str__", 3079, CPyStatic_nodes___globals);
        goto CPyL59;
    } else
        goto CPyL34;
CPyL34: ;
    cpy_r_r48 = cpy_r_r47 ? Py_True : Py_False;
    goto CPyL35;
CPyL59: ;
    CPy_DecRef(cpy_r_a);
    CPyTagged_DecRef(cpy_r_r3);
    CPy_DecRef(cpy_r_r4);
    goto CPyL46;

CPyL34 is the box operation from bool to object.

@JukkaL
Copy link
Collaborator

JukkaL commented Jul 8, 2020

I suspect that a.append(' <invalid item>') is considered unreachable by mypy, and it doesn't type check unreachable code, so types will be incomplete. You can verify that by adding something that should generate an error to it (e.g. 1 + 'x'). If there's no error, the code is treated as unreachable.

A reasonable way forward would be to ensure that the coercion to object works. I'm not sure why we are triggering an assertion, but it may be that we need to add an extra coercion somewhere, or that the coercion doesn't work for short integer types. Each of these should not be hard to fix.

Here's a possible simplified test case (didn't verify that it results in the same issue):

def f(x: List[str], y: str) -> None:
    if isinstance(y, int):  # Since int and str are distinct types, mypy things that this will always be false
        x.append('x')

If you are blocked on this, you can start working on some other issue while we investigate this, such as inlined integer operations.

@TH3CHARLie
Copy link
Collaborator Author

I suspect that a.append(' <invalid item>') is considered unreachable by mypy, and it doesn't type check unreachable code, so types will be incomplete. You can verify that by adding something that should generate an error to it (e.g. 1 + 'x'). If there's no error, the code is treated as unreachable.

A reasonable way forward would be to ensure that the coercion to object works. I'm not sure why we are triggering an assertion, but it may be that we need to add an extra coercion somewhere, or that the coercion doesn't work for short integer types. Each of these should not be hard to fix.

Here's a possible simplified test case (didn't verify that it results in the same issue):

def f(x: List[str], y: str) -> None:
    if isinstance(y, int):  # Since int and str are distinct types, mypy things that this will always be false
        x.append('x')

If you are blocked on this, you can start working on some other issue while we investigate this, such as inlined integer operations.

Yes exactly it is caused by mypy's report on unreachable code. I haven't checked the detail but in a Dict[str, SymbolTableNode] with if isinstance(value, SymbolTableNode), it makes sense mypy thinks the else branch is unreachable(I wonder if it gets called at all). I add a hack to box int32/int64, since they are not supposed to be boxed and the boxed result will not be used anywhere else, I used a NULL to achieve that.

# All ops returning int32/int64 should not be coerced into object.
# Since their result will not be used elsewhere, it's safe to use NULL here
elif is_int32_rprimitive(typ) or is_int64_rprimitive(typ):
self.emit_lines('{}{} = NULL;'.format(declaration, dest))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's the hack

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but can you create a follow-up issue to properly box these as Python int objects? It should be easy enough to support, and it could be useful in the future.

Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, looks good! After this we don't have too many old-style primitives left.

Left one comment about follow-up task.

# All ops returning int32/int64 should not be coerced into object.
# Since their result will not be used elsewhere, it's safe to use NULL here
elif is_int32_rprimitive(typ) or is_int64_rprimitive(typ):
self.emit_lines('{}{} = NULL;'.format(declaration, dest))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine for now, but can you create a follow-up issue to properly box these as Python int objects? It should be easy enough to support, and it could be useful in the future.

@JukkaL JukkaL merged commit b1f5121 into python:master Jul 9, 2020
@TH3CHARLie TH3CHARLie deleted the merge-more-ops-2 branch July 9, 2020 13:21
JukkaL pushed a commit that referenced this pull request Jul 10, 2020
Follows #9110's comment.

Using CPython API to box int32/int64 into PyObject*.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants