Skip to content

Latest commit

 

History

History

analysis

This code is responsible for answering two questions:

Where do definitions reach?

Traditionally reaching definitions, a classic dataflow-analysis, has been used to answer this question. To understand reaching definitions, watch this wonderful YouTube video and come back here. We use reaching definitions, with one small modification, a reassignment check.

# Reassignment check
if cfg_node.left_hand_side not in cfg_node.right_hand_side_variables:
    # Get previous assignments of cfg_node.left_hand_side and remove them from JOIN
    arrow_result = self.arrow(JOIN, cfg_node.left_hand_side)

We do this because, e.g.

image_name = request.args.get('image_name')
image_name = os.path.join(base_dir, image_name)
send_file(image_name)

we still want to know that something from a request reached send_file.

How does a definition reach?

After we know that a definition reaches a use that we are interested in, we use what are called definition-use chains to figure out how definitions reach their uses. This is necessary because there may be multiple paths from definition to use. Here is how we create definition_chains:

def build_def_use_chain(
    cfg_nodes,
    lattice
):
    def_use = defaultdict(list)
    # For every node
    for node in cfg_nodes:
        # That's a definition
        if isinstance(node, AssignmentNode):
            # Get the uses
            for variable in node.right_hand_side_variables:
                # Loop through most of the nodes before it
                for earlier_node in get_constraint_nodes(node, lattice):
                    # and add them to the 'uses list' of each earlier node, when applicable
                    # 'earlier node' here being a simplification
                    if variable in earlier_node.left_hand_side:
                        def_use[earlier_node].append(node)
    return def_use

Additional details

This folder will probably not change for the lifetime of the project, unless we were to implement more advanced analyses like solving string constraints or doing alias analysis. Right now there are more pressing concerns, like handling web frameworks and handling all AST node types in the CFG construction.

Stefan and Bruno like the Schwartzbach notes, as you will see in some comments. But looking up these two algorithms will yield countless results, my favorite is this amazing guy from YouTube.