-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Abstract away optional AST traversals (first step) #1426
[core] Abstract away optional AST traversals (first step) #1426
Conversation
da64520
to
847f294
Compare
Generated by 🚫 Danger |
Some general thoughts:
Fully agree. We have also some unit tests, that manually call the parser and if they need typeres, they then manually need the different facades - and they need to call them in the correct order (e.g. qname before typeres). And it is definitely a language implementation detail.
Do we need this detail? If the language handler knows all the stages, it should also know the order, in which they are executed. If an AstProcessingStage depends on another stage, maybe we shouldn't have it separated at all?
This would allow to have maybe some performance improvements if (and only if) you are only executing rules, that don't need all stages. I would argue for removing this feature:
The changes you proposed make sense. I'll look into the code now. Thanks! |
* @since 7.0.0 | ||
*/ | ||
@Experimental | ||
default boolean dependsOn(AstProcessingStage<?> stage) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in my general comment, I'd argue for not introducing this feature to enable/disable specific stages...
// basically: | ||
// 1. make the union of all stage dependencies of each rule, by language, for the Rulesets | ||
// 2. order them by dependency | ||
// 3. run them and time them if needed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be simplified to just: "run them all and time them"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is enticing in theory, but I think we should first address the performance issues, and then enable it by default
// The approach I'd like to take is either | ||
// * to create a new RunnableRulesets class which is immutable, and performs all these preliminary | ||
// computations upon construction. | ||
// * or to modify Ruleset and Rulesets to be immutable. This IMO is a better option because it makes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think, immutability is the way to go. Ruleset is already immutable, isn't it? So, we would just need make Rulesets immutable - this would be then the configured rules, that are to be executed.
Since executing rules runs in multiple threads and rules do not need to be thread-safe, we'll still need to be able to clone a complete Rulesets instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd make RuleSets
immutable. Deprecate mutation methods and move fordward
* | ||
* @return VisitorStarter | ||
*/ | ||
// TODO should we deprecate? Not much use to it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DumpFacade: It's more a tool to help during development. We could deprecated it and point to the Designer, that should be used instead to get a look at the AST....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
definitely!
* after parsing is done. Each of these stages implicitly | ||
* depends on the parser stage. | ||
* | ||
* <p>An analysis on a file goes through the following stages: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where would you see the multifile analysis? Is this a stage as well?
For multifile analysis, we probably can only run the rules, that need it, after all files have been fully parsed. So, the collection of the data could be done in a stage, that is executed individually for each file, but I think, the rules, that make use of it, need to be executed separately, after rulechain rules and other rules have been executed... Wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think multifile wouldn't fit neatly into this framework. I'd like it more to be something transparent to the rules.
I imagine we could have a separate class of visitors (not rules) that run after parsing and before rule application. These are declared by rules, and each gather some information relevant to the rule that declares them. This could be a stage, but the actual visitors being applied depends on the contents of the ruleset, so it probably needs special treatment...
Upon rule application the mutlifile rules query the contents of a global index, knowing what keys their data collection visitors registered and being able to access that info for all files of the analysis. Multifile rules wouldn't need an AST to run, so they should also be treated very specially.
I read a bit about IntelliJ's way of doing it, which is very different although very powerful. We could apply some of the things they do to our codebase. We can perhaps talk about it in the next meeting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multifile rules wouldn't need an AST to run, so they should also be treated very specially.
Right, multifile rules would be very different to "normal" rules...
public RuleViolationFactory getRuleViolationFactory() { | ||
throw new UnsupportedOperationException("getRuleViolationFactory() is not supported for C++"); | ||
protected String getLanguageName() { | ||
return "C++"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably use here net.sourceforge.pmd.lang.cpp.CppLanguageModule.NAME
. Somewhere I've read your question about why we have separate CPD and PMD language modules...
Well, for now, we use the java.util.ServiceLoader<Language>
facility to locate the available languages on the classpath - we have a CPD language class net.sourceforge.pmd.cpd.Language
and a PMD language class net.sourceforge.pmd.lang.Language
. We could unify them and move into the class the information, whether it's CPD only or PMD only.
There was once also the discussion, whether CPD should be part of PMD or a completely separate tool. If we unify them, we merge the two tools even closer together...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The complete CppHandler class is unnecessary and can be removed with 7.0.0. See #1481.
First PR for 7.0.0! 😄 🤘
Basically this is a first step to simplify the way we handle optional AST traversals like typeresolution or symbol table. It's necessary to detect XPath dependencies automatically, and globally will make our life easier.
Why we need a change
Type resolution, DFA, etc. all use a visitor that runs on the AST, wrapped in not-so-useful "façades" and provided by a LanguageVersionHandler. At the moment:
AST processing stages all have in common that:
This forms a basis for abstracting them.
What's in this PR
dependsOn(AstProcessingStage)
, that returns whether the rule depends on the stage or not.What's not in this PR
The actual way a rule declares a dependency on an stage is still unclear. Probably there will be a method on the Rule interface that returns the AstProcessingStages the rule depends on. This can then be implemented by abstract rule classes.E.g. Java rules may use annotations and XPath rules may use some analysis of the XPath expression ([core] Use annotations to resolve rule dependencies to frameworks #437)This is left for future work, so is kind of WIP in this PR.