Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX: SimpleSQLGrammar quote parsing regression #5700

Merged

Conversation

jasonlyle88
Copy link
Contributor

Impact

  • Bug fix (non-breaking change which fixes expected existing functionality)
  • Enhancement/New feature (adds functionality without impacting existing logic)
  • Breaking change (fix or feature that would cause existing functionality to change)

Description

Fixes #5674

PR #5108 made a modification to the grammar used to parse strings. It expanded functionality for string identification with escaped quotes and characters.

However, it broke identification of other strings. This PR aims to correct that.

The issue was that non-escape characters that followed a \ character would pre-maturely end the token regex looking for character literals. I fixed this by not looking for specific escape sequences but allowing for any character to follow a \ character without issue.

I have also updated the associated test file to add more test cases. All existing test cases pass un-modified while the new test cases pass with these new modifications.

Things to be aware of

Things to worry about

Additional Context

The changes to the identifier tokens (S_IDENTIFIER, SPECIAL_CHARS, etc) are functionally equivalent to what they were before. However, in the process of debugging the grammar, I was unsure where to start and started from the top down. The identifier grammar was a little unclear, so the changes are just to make the grammar more explicit in this area.

@MalloD12 MalloD12 self-assigned this Mar 15, 2024
@jasonlyle88
Copy link
Contributor Author

Hey @MalloD12 ! Thanks for picking this up for review. I think this is all set, let me know if you need anything. I think this is a pretty big deal, so would love to try to get this in for the next release if possible!

@MalloD12
Copy link
Collaborator

Hi @jasonlyle88,

I started looking into this on Friday. I'm not a grammar expert, but your rule changes look good to me. Also, previous tests added in #5108 keep working as expected. For debugging the different rules for example, S_IDENTIFIER, SPECIAL_CHARS, etc; did you use SimpleSqlGrammarTest? Before moving ahead with this I would like to make sure we are not breaking any of these rules.

Thanks,
Daniel.

@jasonlyle88
Copy link
Contributor Author

I did use the SimpleSqlGrammarTest.groovy file while developing this, and all the tests are still in that file unmodified (other than a formatting change that was bugging me that doesn't change the functionality of the test scenario). I also added more test cases to cover the issues I was seeing.

Additionally, I added some output to the SimpleSqlGrammarTest.groovy that outputs a separator, the current test string, and then for each token: the type of identifier and the string that is identified for that token. This helped me make sure the identifiers didn't change across my changes: all existing strings stayed the same kind of identifier as far as I could find

@jasonlyle88
Copy link
Contributor Author

Made me second guess myself since I just did a manual checking of the output across tests for the token type and text for each token. Added my output code to my local master, compiled, captured and saved the output. Then switched to this branch, compiled, captured and saved the output. Did a diff of the two files and the results are identical (other than my branch had the additional tests). Attaching the files here in case anyone wants to see the results
lb_grammar_test_results_branch.txt
lb_grammar_test_results_master.txt

@MalloD12
Copy link
Collaborator

Made me second guess myself since I just did a manual checking of the output across tests for the token type and text for each token. Added my output code to my local master, compiled, captured and saved the output. Then switched to this branch, compiled, captured and saved the output. Did a diff of the two files and the results are identical (other than my branch had the additional tests). Attaching the files here in case anyone wants to see the results lb_grammar_test_results_branch.txt lb_grammar_test_results_master.txt

Hey @jasonlyle88,

Nice, I liked that manual testing check. I didn't get how you generated those output files. Could you please detail those steps you mentioned for me? That could be very useful for me when having to review any other grammar issue next time.

Other than that, this PR looks good to me. I just made a formatting change to the test class and added two of the scenarios you listed in the description of #5674, I think they are already covered by the ones you already added but like of those random mixed symbols 😄.

@MalloD12 MalloD12 self-requested a review March 18, 2024 18:58
@jasonlyle88
Copy link
Contributor Author

Awesome, sounds good! Thanks.

As for how I generated them: I'm using IntilliJ Idea. When I have the SimpleSqlGrammarTest.groovy file open, I have icons in my line gutter allowing me to run an individual test or all tests in the current class (only 1 in the case of this file). When I run the test, it open a result panel with a tree of the test results (class results at the root, test results at the next level node, and individual test cases at the next level node). By selecting the root (or the individual test) node, I get to see all the output from the test. I just copied and pasted that output to make the file.

I'm sure you can do this when running the test through the CLI as well... but I don't know how to run individual tests via the CLI (letting my Java noob show through here 🤣).

Copy link
Collaborator

@filipelautert filipelautert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the fix @jasonlyle88 !

@jasonlyle88
Copy link
Contributor Author

No problem, always happy to contribute!

@MalloD12
Copy link
Collaborator

Awesome, sounds good! Thanks.

As for how I generated them: I'm using IntilliJ Idea. When I have the SimpleSqlGrammarTest.groovy file open, I have icons in my line gutter allowing me to run an individual test or all tests in the current class (only 1 in the case of this file). When I run the test, it open a result panel with a tree of the test results (class results at the root, test results at the next level node, and individual test cases at the next level node). By selecting the root (or the individual test) node, I get to see all the output from the test. I just copied and pasted that output to make the file.

I'm sure you can do this when running the test through the CLI as well... but I don't know how to run individual tests via the CLI (letting my Java noob show through here 🤣).

Thanks @jasonlyle88! I thought you had a trick to run these tests and grab the output from all of them automatically.

@filipelautert filipelautert merged commit 6f50035 into liquibase:master Mar 21, 2024
31 of 33 checks passed
@filipelautert filipelautert added this to the 1NEXT milestone Mar 21, 2024
@jasonlyle88 jasonlyle88 deleted the fix-quote-parsing-regression branch March 21, 2024 14:57
@jasonlyle88
Copy link
Contributor Author

Thanks everyone for the quick work here! Appreciate it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

BUG: Quoting not always identified properly, causes issues in end delimiter identification
4 participants