Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: static variable analysis #770

Open
wants to merge 27 commits into
base: master
Choose a base branch
from

Conversation

jg-rp
Copy link
Contributor

@jg-rp jg-rp commented Nov 16, 2024

Statically analyze templates and report variable usage.

Usage

Retrieve the names of variables used in a template with Liquid.variables(template). It returns an array of strings, one string for each distinct variable, without its properties.

import { Liquid } from 'liquidjs'

const engine = new Liquid()

const template = engine.parse(`\
<p>
  {% assign title = user.title | capitalize %}
  {{ title }} {{ user.first_name | default: user.name }} {{ user.last_name }}
  {% if user.address %}
    {{ user.address.line1 }}
  {% else %}
    {{ user.email_addresses[0] }}
    {% for email in user.email_addresses %}
       - {{ email }}
    {% endfor %}
  {% endif %}
<p>
`)

console.log(engine.variablesSync(template))

Output

[ 'user', 'title', 'email' ]

Alternatively, use Liquid.fullVariables(template) to get a list of variables including their properties. Notice that variables from tag and filter arguments are included too.

// continued from above
engine.fullVariables(template).then(console.log)

Output

[
  'user.title',
  'user.first_name',
  'user.name',
  'user.last_name',
  'user.address',
  'user.address.line1',
  'user.email_addresses[0]',
  'user.email_addresses',
  'title',
  'email'
]

Or use Liquid.variableSegments(template) to get an array of strings and numbers that make up each variable's path.

// continued from above
engine.variableSegments(template).then(console.log)

Output

[
  [ 'user', 'title' ],
  [ 'user', 'first_name' ],
  [ 'user', 'name' ],
  [ 'user', 'last_name' ],
  [ 'user', 'address' ],
  [ 'user', 'address', 'line1' ],
  [ 'user', 'email_addresses', 0 ],
  [ 'user', 'email_addresses' ],
  [ 'title' ],
  [ 'email' ]
]

Global Variables

Notice, in the examples above, that title and email are included in the results. Often you'll want to exclude names that are in scope from {% assign %} tags, and temporary variables like those introduced by a {% for %} tag.

To get names that are expected to be global, that is, provided by application developers rather than template authors, use the globalVariables, globalFullVariables or globalVariableSegments methods (or their synchronous equivalents) of a Liquid class instance.

// continued from above
engine.globalVariableSegments(template).then(console.log)

Output

[
  [ 'user', 'title' ],
  [ 'user', 'first_name' ],
  [ 'user', 'name' ],
  [ 'user', 'last_name' ],
  [ 'user', 'address' ],
  [ 'user', 'address', 'line1' ],
  [ 'user', 'email_addresses', 0 ],
  [ 'user', 'email_addresses' ]
]

Partial Templates

By default, LiquidJS will try to load and analyze any included and rendered templates too.

import { Liquid } from 'liquidjs'

const footer = `\
<footer>
  <p>&copy; {{ "now" | date: "%Y" }} {{ site_name }}</p>
  <p>{{ site_description }}</p>
</footer>`

const engine = new Liquid({ templates: { footer } })

const template = engine.parse(`\
<body>
  <h1>Hi, {{ you | default: 'World' }}!</h1>
  {% assign some = 'thing' %}
  {% include 'footer' %}
</body>
`)

engine.globalVariables(template).then(console.log)

Output

[ 'you', 'site_name', 'site_description' ]

You can disable analysis of partial templates by setting the partials options to false.

// continue from above
engine.globalVariables(template, { partials: false }).then(console.log)

Output

[ 'you' ]

If an {% include %} tag uses a dynamic template name (one that can't be determined without rendering the template) it will be ignored, even if partials is set to true.

@coveralls
Copy link

coveralls commented Nov 16, 2024

Pull Request Test Coverage Report for Build 11894990314

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 168 of 216 (77.78%) changed or added relevant lines in 24 files are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-2.0%) to 97.97%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/tags/block.ts 0 1 0.0%
src/tags/for.ts 7 9 77.78%
src/tags/tablerow.ts 5 7 71.43%
src/tags/include.ts 1 8 12.5%
src/tags/layout.ts 1 8 12.5%
src/template/analysis.ts 100 112 89.29%
src/tags/render.ts 1 18 5.56%
Files with Coverage Reduction New Missed Lines %
src/drop/blank-drop.ts 1 90.91%
Totals Coverage Status
Change from base Build 11800996948: -2.0%
Covered Lines: 2700
Relevant Lines: 2749

💛 - Coveralls

src/tags/cycle.ts Outdated Show resolved Hide resolved

expect(analysis).toStrictEqual({
variables: { 'd[a[b.c]]': [d], 'a[b.c]': [a], 'b.c': [bc] },
globals: { 'd[a[b.c]]': [d], 'a[b.c]': [a], 'b.c': [bc] },
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if d, a are global while b is local? Not sure if we can implement it thoroughly. If not, would you consider to reduce this feature to a list of top level variables:

globals: ['d', 'a', 'b']

I guess this list is already useful for new comers, am I correct here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a test with a local variable nested within a global variable. Still not sure what's best here.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you consider my proposal in another comment? I assume dynamic index is not important as users won't figure out its value in runtime (not sure, do you actually have such a use case that I'm missing out?). Thus we can simplize the situation. For references:

a[b.c].d
a[0].c

We return

a[].d
b.c
a.0.c or a[0].c // considering `a.0.c` can be confusion when 0 is a string (which can contain dots), latter maybe better as you have a `variableSegments` for easier consumption anyway, this is only for inspection I guess

Note: static index like 0 or quoted strings can still be treated statically, like property access.

@jg-rp jg-rp mentioned this pull request Nov 17, 2024
2 tasks
@jg-rp
Copy link
Contributor Author

jg-rp commented Nov 18, 2024

With the latest commit, I've changed several of the built-in tags to fix their row and column numbers reported from Token.getPosition().

I guess this is a good point to decide if having correct row and column is important enough to warrant these changes.

Notice that when parsing if, elsif and unless tags, we're now avoiding some string slicing. Instead, when new Value() is given a TagToken, we pass the entire input string and a range to Tokenizer, which I'm hoping will result in a performance boost rather than a performance penalty.

src/template/value.ts Outdated Show resolved Hide resolved
src/template/value.ts Outdated Show resolved Hide resolved
@jg-rp
Copy link
Contributor Author

jg-rp commented Nov 23, 2024

I've provisionally implemented some convenience analysis methods on the Liquid class. Please consider these (along with any other features) as ideas that can be removed before merging.

Other ideas that I've yet to implement:

  • Report names of Liquid filters and their locations in analysis results (I've seen people ask for this before).
  • Report names of tags and their locations in analysis results.
  • Add options to control partial template analysis. At the moment we throw an error if a partial template can't be loaded, and silently ignore partial templates included/rendered with a dynamic name.

@jg-rp jg-rp marked this pull request as ready for review November 24, 2024 08:12
@jg-rp
Copy link
Contributor Author

jg-rp commented Dec 4, 2024

@harttle 👋 , I've not forgotten about this or your previous comments about keeping track of aliases. I'll get back to it soon.

]
```

Or use `Liquid.variableSegments(template)` to get an array of strings and numbers that make up each variable's path.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For fullVariables and variableSegments, can we also include an examle for nested variables, or we'll need to mention how nesting will be handled in return values of these 2.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reference a[b.c].d, I think we have another option, return 2 references without nesting them. As what's inside [] is not important because it's dynamic anyway:

a[].d
b.c

If you adopt this implementation, we'll need to decide how to represent [] in variableSegments return value. Otherwise these 2 will be the same:

arr[0].length
arr.length

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe differentiate with nesting like:

['arr', ['length']]
['arr', 'length']

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using your example, a[b.c].d, variableSegments was incorrectly producing something like this:

[
  [
    'a',
    Variable {
      segments: [ 'b', 'c' ],
      location: { row: 1, col: 6, file: undefined }
    },
    'd'
  ],
  [ 'b', 'c' ]
]

This was not my intention. Now we get the following.

variables

[ 'a', 'b' ]

fullVariables

[
  'a[b.c].d',
  'b.c'
]`

variableSegments

[
  [ 'a', [ 'b', 'c' ], 'd' ],
  [ 'b', 'c' ]
]

For arr[0].length and arr.length, we get [ 'arr', 0, 'length' ] and [ 'arr', 'length' ], respectively.

Is that what you had in mind?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean maybe we can drop nesting. And treat them as different references. I guess what exactly is inside ’[]’ is not important, we only know that array/map is accessed. Assuming figure out its value when called is not feasible for static analysis, then ppl won’t use that information effectively. Here’s my simplified approach

variables

[ 'a', 'b' ]
fullVariables

[
'a[].d',
'b.c'
]`
variableSegments

[
[ 'a', [ 'd' ] ],
[ 'b', 'c' ]
]

note the last one use nested array to represent entering into array, not nested index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I think your approach should be in addition to variableSegments, as separate method/s. Then users have the option of working with nested variables and array indexes, if they're useful, or your more convenient representation if they're not.

Perhaps normalizedSegments would be a good name?

It seems like we're loosing potentially valuable information if we drop nested variables altogether.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just another idea to simplify the implementation. If you still think dynamic index info is important, we can keep your current implementation. No need to compromise for my opinion.

src/liquid.ts Show resolved Hide resolved
src/liquid.ts Show resolved Hide resolved
@jg-rp
Copy link
Contributor Author

jg-rp commented Dec 22, 2024

Can you think of better names for fullVariables, globals, locals, and/or segments?

At the moment:

  • The name "global" or "global variables" is inspired by Python's built-in globals() function. In our case it means names added to a template's scope by application developers at render time. Global variables are available to the root template and any templates that it includes, including those rendered with the {% render %} tag.

  • The name "locals" or "template local variables" is inspired by Python's built-in locals() function. Here it means names that have been added to a template's scope from an {% assign %}, {% capture %}, {% increment %}, etc. tag.

  • "Segments" is a term used by RFC 9535. One can think of Liquid variables as paths, where each path is made up of one or more segments.

  • A "fullVariable" is roughly equivalent to a "normalized path" in RFC 9535.


expect(analysis).toStrictEqual({
variables: { 'd[a[b.c]]': [d], 'a[b.c]': [a], 'b.c': [bc] },
globals: { 'd[a[b.c]]': [d], 'a[b.c]': [a], 'b.c': [bc] },
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you consider my proposal in another comment? I assume dynamic index is not important as users won't figure out its value in runtime (not sure, do you actually have such a use case that I'm missing out?). Thus we can simplize the situation. For references:

a[b.c].d
a[0].c

We return

a[].d
b.c
a.0.c or a[0].c // considering `a.0.c` can be confusion when 0 is a string (which can contain dots), latter maybe better as you have a `variableSegments` for easier consumption anyway, this is only for inspection I guess

Note: static index like 0 or quoted strings can still be treated statically, like property access.

src/liquid.ts Show resolved Hide resolved
@@ -0,0 +1,286 @@
---
Copy link
Owner

@harttle harttle Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need change sidebar and en.yml, to make this file visible on sidebar.

]
```

Or use `Liquid.variableSegments(template)` to get an array of strings and numbers that make up each variable's path.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I mean maybe we can drop nesting. And treat them as different references. I guess what exactly is inside ’[]’ is not important, we only know that array/map is accessed. Assuming figure out its value when called is not feasible for static analysis, then ppl won’t use that information effectively. Here’s my simplified approach

variables

[ 'a', 'b' ]
fullVariables

[
'a[].d',
'b.c'
]`
variableSegments

[
[ 'a', [ 'd' ] ],
[ 'b', 'c' ]
]

note the last one use nested array to represent entering into array, not nested index.

@harttle
Copy link
Owner

harttle commented Dec 23, 2024

your current naming is OK for me. Sorry some comments are not submitted yesterday, causing confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants