Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiline still not working with single parsable lines in statement #11

Closed
HallerPatrick opened this issue Nov 11, 2020 · 10 comments
Closed
Labels
bug Something isn't working help wanted Extra attention is needed
Milestone

Comments

@HallerPatrick
Copy link
Owner

HallerPatrick commented Nov 11, 2020

from frosch import hook

hook()

x = (
    3 +
    "String")
  File "C:\Users\user\venv37\lib\site-packages\frosch\frosch.py", line 118, in get_whole_expression
    raise SyntaxError("SyntaxError in line:{}".format(stack.lineno)) from error
SyntaxError: SyntaxError in line:8

Original exception was:
Traceback (most recent call last):
  File "main.py", line 8, in <module>
    "Striong")
TypeError: unsupported operand type(s) for +: 'int' and 'str'

Tested in cmd

@HallerPatrick
Copy link
Owner Author

Update:
This was not working, because we only looked downwards the lines to see if we can complete the whole statement. Now it looks both ways.

This still needs work to do, because multiline statements which contains lines that are parsable (with ast) on its own are crashing the program.

@HallerPatrick HallerPatrick added the bug Something isn't working label Nov 11, 2020
@HallerPatrick HallerPatrick changed the title Multiline still not working Multiline still not working with single parsable lines in statement Nov 11, 2020
@HallerPatrick HallerPatrick added the help wanted Extra attention is needed label Nov 11, 2020
@HallerPatrick HallerPatrick added this to the 0.2.0 Release milestone Nov 11, 2020
@alexmojaki
Copy link

Maybe this is covered by the last sentence above (I couldn't understand it) but keep in mind cases like this:

if (
    3 +
    "String"):
    foo()
    bar()

FYI, the way stack_data works is that it parses the entire file and then extracts what it needs from there, so it never parses partial broken code.

Are you still planning on collapsing multiple lines into one? That could lead to some very long lines.

@HallerPatrick
Copy link
Owner Author

@alexmojaki I tried using stack_data for extracting the code from the tracebacks, but didnt have much success. The problem was that the traceback I get contains frames of type FrameSummary, which either stack_data or other dependencies could not handle. Or maybe I am using the lib wrong.

To recap:

We have the scenario that the program is crashing in a multiline statement. We get the traceback and therefore get the current line (line of crash) and the line number.
We then try to parse this line:

ast.parse(line)

To inspect the used variables. This crashes because python only gives us the first line in which the runtime error appears.

@alexmojaki can you provide me with a little example on how stack_data can be useful here? You say that it parses the entire file. Can we then deduce the full statement by a given single line?

And yes the whole multiline statement is collapsed into one. The ConsoleWriter is not (yet) sophisticated enough to allow
for more "freestyle" annotations.

@alexmojaki
Copy link

@alexmojaki I tried using stack_data for extracting the code from the tracebacks, but didnt have much success. The problem was that the traceback I get contains frames of type FrameSummary, which either stack_data or other dependencies could not handle. Or maybe I am using the lib wrong.

Indeed, you need to pass an actual traceback object, e.g. the last argument of sys.excepthook. You don't use the traceback module.

@alexmojaki can you provide me with a little example on how stack_data can be useful here? You say that it parses the entire file. Can we then deduce the full statement by a given single line?

Yes, that's basically it. We know where each node starts and ends so we look for the one containing the current line.

And yes the whole multiline statement is collapsed into one. The ConsoleWriter is not (yet) sophisticated enough to allow for more "freestyle" annotations.

Right, this isn't just a technical problem, it's a design problem that may have no solution. better-exceptions has the same look and faces the same problem. I tried to brainstorm solutions in https://github.com/Qix-/better-exceptions/issues/92 but didn't get anywhere. Your collapsing idea is a neat and clever solution but it can easily get out of hand.

@alexmojaki
Copy link

There's a new Formatter class in stack_data that isn't documented but it's released and stable. Even if you don't use it directly the source code is well structured and very readable, it should help you understand the core of the library. It's basically the README implemented in solid code. Here's an example usage:

import stack_data

formatter = stack_data.Formatter(
    options=stack_data.Options(before=0, after=0),
)

formatter.set_hook()

# Trigger an exception
import json

print(
    json.loads(" s s")
)

I do need to improve the documentation and such, but motivation is limited because the library is intended for a very niche audience, mostly library writers like yourself. I'm happy to help you integrate it though.

@HallerPatrick
Copy link
Owner Author

Okay after some trying out I think I got what I want.

frame_info = stack_data.FrameInfo(traceback_)
t = frame_info.executing.source.statements_at_line(last_stack.lineno)
t = t.pop()
print(ast.dump(t)) # Contains the statement I am looking for

I digged a little through stack_data and found out that your executing library is probably doing the job on its own. (Little side note: Took me a while to find out that executing is indeed a third-party package. Had to look through setup.py, pyproject.toml and finally setup.cfg).

@alexmojaki Can you comment on the code snippet and tell me if there is a more cleaner way to get the ast nodes of the statement?
And also do you know a way to convert ast nodes to properly formatted python code?

@alexmojaki
Copy link

The problem is you don't really want a whole statement. If the statement is a for loop or something else massive you're in trouble. That's why stack_data has the concept of a piece. Here's some code:

import ast

import stack_data


class FlatFormatter(stack_data.Formatter):
    def format_frame(self, frame):
        yield self.format_frame_header(frame)
        collapsed = "".join(line.text for line in frame.lines)
        yield f"{collapsed}\n"
        cumulative_offset = 0
        for line in frame.lines:
            for var, node in frame.variables_by_lineno[line.lineno]:
                if isinstance(node, ast.Name):
                    token = node.first_token
                elif isinstance(node, ast.Attribute):
                    token = node.last_token
                else:
                    # Not clear how to point to subscripts
                    continue
                offset = cumulative_offset + token.start[1]
                yield " " * offset + f"^ = {var.value!r}\n"
            cumulative_offset += len(line.text)

formatter = FlatFormatter(
    options=stack_data.Options(before=0, after=0),
)

formatter.set_hook()

# Trigger an exception
import json

formatter.j = " s s"
try:
    for _ in (
        json.loads(formatter.j)
    ):
        pass
except:
    formatter.print_exception()

Result:

Traceback (most recent call last):
 File "/home/alex/.config/JetBrains/PyCharm2020.2/scratches/scratch_979.py", line 37, in <module>
    for _ in (        json.loads(formatter.j)    ):
                                 ^ = <__main__.FlatFormatter object at 0x7f7a09111a60>
                                           ^ = ' s s'
 File "/home/alex/.pyenv/versions/3.8.5/lib/python3.8/json/__init__.py", line 357, in loads
        return _default_decoder.decode(s)
                                       ^ = ' s s'
               ^ = <json.decoder.JSONDecoder object at 0x7f7a08e44d30>
 File "/home/alex/.pyenv/versions/3.8.5/lib/python3.8/json/decoder.py", line 337, in JSONDecoder.decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
                                   ^ = ' s s'
                                             ^ = ' s s'
                   ^ = <json.decoder.JSONDecoder object at 0x7f7a08e44d30>
                                          ^ = <built-in method match of re.Pattern object at 0x7f7a08828930>
 File "/home/alex/.pyenv/versions/3.8.5/lib/python3.8/json/decoder.py", line 355, in JSONDecoder.raw_decode
            raise JSONDecodeError("Expecting value", s, err.value) from None
                                                     ^ = ' s s'
json.decoder.JSONDecodeError: Expecting value: line 1 column 2 (char 1)
  1. Note that it contains the header of the for loop but not the whole thing.
  2. asttokens is great for getting source code corresponding to nodes. That's where first_token and last_token come from.
  3. Note that with the help of pure_eval stack_data can also pick up attributes like formatter.j.
  4. It can also get subscripts but it's not clear how to have arrows pointing simultaneously at foo, bar, and foo[bar].
  5. You should probably strip the text of each line, will need to adjust the offset calculations though.

@HallerPatrick
Copy link
Owner Author

I could not really wrap my head around the formatters from stack_data, but I managed to use the "pieces".

def extract_statement_piece(traceback_: TracebackType, last_stack) -> List[List[Token]]:
    """Get frame infos and get code pieces (from stack_data) by line
    of crash """
    frame_info = stack_data.FrameInfo(traceback_)
    pieces = frame_info.executing.source.pieces
    tokens = frame_info.executing.source.tokens_by_lineno

    statement_piece_tokens = []
    for piece in pieces:
        if last_stack.lineno in list(piece):
            for line in list(piece):
                statement_piece_tokens.append(tokens[line])

    return statement_piece_tokens

This probably can be cleaned up, but now is working with multiline statements swell as for-loops, if-stmts etc.

For now I will put this issue on solved.
@alexmojaki you will be the first one I report to, if the are new bugs coming up ;) and thanks for your nice libs!

@alexmojaki
Copy link

It worries me that all the AST info has been lost and now you just have a list of tokens, especially if you're going to be using that to point to variables. Working with tokens doesn't go well. Some examples of my experience with this:

Here is the same code from before without formatters:

import ast

import stack_data


def format_frame(tb):
    frame = stack_data.FrameInfo(tb, stack_data.Options(before=0, after=0))
    collapsed = "".join(line.text for line in frame.lines)
    yield f"{collapsed}\n"
    cumulative_offset = 0
    for line in frame.lines:
        for var, node in frame.variables_by_lineno[line.lineno]:
            if isinstance(node, ast.Name):
                token = node.first_token
            elif isinstance(node, ast.Attribute):
                token = node.last_token
            else:
                # Not clear how to point to subscripts
                continue
            offset = cumulative_offset + token.start[1]
            yield " " * offset + f"^ = {var.value!r}\n"
        cumulative_offset += len(line.text)


# Trigger an exception
def main():
    import json

    json.j = " s s"
    try:
        for _ in (
                json.loads(json.j)
        ):
            pass
    except Exception as e:
        for line in format_frame(e.__traceback__):
            print(line, end='')


main()

By the way, in works fine with ranges, no need to convert to a list.

@HallerPatrick
Copy link
Owner Author

I am indirectly pointing to variables, all I want are the tokens of the statement or the piece of statement. Those are then formatted and after we apply the value annotation. So no offset calculations on the actual tokens.

You are probably right, that there will be some drawbacks, maybe if we want to do more sophisticated stuff, but for now this should work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants