[core] Abstract away optional AST traversals (first step) #1426

oowekyala · 2018-11-02T07:01:36Z

First PR for 7.0.0! 😄 🤘

Basically this is a first step to simplify the way we handle optional AST traversals like typeresolution or symbol table. It's necessary to detect XPath dependencies automatically, and globally will make our life easier.

Why we need a change

Type resolution, DFA, etc. all use a visitor that runs on the AST, wrapped in not-so-useful "façades" and provided by a LanguageVersionHandler. At the moment:

Every language must have all of them, even if it's just a dummy. This is bad because these stages are ultimately language-specific ---e.g. not all languages will implement their type res with a preliminary qname resolution like Java
- Each language should be able to define themselves the visits they want to schedule independently of how other languages do it
They are all executed in the same order for all language (logic is in SourceCodeProcessor), even though the order is an implementation detail which should rest... with the language implementation.
Adding a new processing stage which only some rules activate is useful, yet requires a change to the LanguageVersionHandler interface, to its abstract class, to the SourceCodeProcessor, to the Rule interface, etc. The change must cascade from Rule to RuleSet to RuleSets. It also requires a change to the ruleset schema. Hmm spaghetti 🍝
- These optional AST visits should be handled uniformly by pmd-core, and adding or removing them only entail changes to the language implementation, not pmd-core
Current timing logic and check for dependency is duplicated for dfa, typeres, multifile, etc.

AST processing stages all have in common that:

They take an AST + other config (e.g. a classloader) and perform side effects on the AST (e.g. to populate qnames).
They all run after the parsing stage and before rule application

This forms a basis for abstracting them.

What's in this PR

Processing stages are reified under the interface AstProcessingStage.
LanguageVersionHandlers know the full set of AstProcessingStage a language version supports.
AstProcessingStages may declare dependencies on other stages. E.g. Java typeres depends on qname resolution.
AstProcessingStages are implemented by an enum for each language that supports some. This is quite natural for ordering reasons. Adding or removing AST visits is easy, so is specifying dependencies.
The Rule interface has a new method dependsOn(AstProcessingStage), that returns whether the rule depends on the stage or not.
- Having it there instead of on the AstProcessingStage allows us to implement it differently in XPath rules and Java rules.
- E.g. Java rules may use annotations and XPath rules may use some analysis of the XPath expression ([core] Use annotations to resolve rule dependencies to frameworks #437)

What's not in this PR

The actual way a rule declares a dependency on an stage is still unclear. Probably there will be a method on the Rule interface that returns the AstProcessingStages the rule depends on. This can then be implemented by abstract rule classes.
- ~~E.g. Java rules may use annotations and XPath rules may use some analysis of the XPath expression ([core] Use annotations to resolve rule dependencies to frameworks #437)~~
- ~~This is left for future work, so is kind of WIP in this PR.~~
The processing stages are not wired into SourceCodeProcessor for now.
- TimedOperationCategory needs to be refactored to a class to be extensible. This is probably the next PR.
- The fact that RuleSets is mutable poses a problem to avoiding unnecessary computation. Either we make it immutable, or we have a new immutable class that takes care of precomputing all that can be done in advance. I'd really like them to make them immutable, because it will be much simpler to build them and reason about them, but this will change the API significantly. Wdyt?
- This is further explained in a comment in SourceCodeProcessor.
Since it's not wired I haven't removed the former API, just deprecated it. Deprecations will need to be ported to the master branch.

pmd-test · 2018-11-02T07:50:58Z

	1 Warning
⚠️	Running pmdtester failed, this message is mainly used to remind the maintainers of PMD.

Generated by 🚫 Danger

adangel · 2018-11-20T07:58:34Z

Some general thoughts:

They are all executed in the same order for all language (logic is in SourceCodeProcessor), even
though the order is an implementation detail which should rest... with the language implementation.

Fully agree. We have also some unit tests, that manually call the parser and if they need typeres, they then manually need the different facades - and they need to call them in the correct order (e.g. qname before typeres). And it is definitely a language implementation detail.

AstProcessingStages may declare dependencies on other stages. E.g. Java typeres depends on
qname resolution.

Do we need this detail? If the language handler knows all the stages, it should also know the order, in which they are executed. If an AstProcessingStage depends on another stage, maybe we shouldn't have it separated at all?
I'm just trying to avoid the logic of determining the correct ordering of the AstProcessingStages based on the dependency - it would be much easier, if the LanguageHandler impl would just define the correct order.

The Rule interface has a new method dependsOn(AstProcessingStage), that returns whether the
rule depends on the stage or not.

This would allow to have maybe some performance improvements if (and only if) you are only executing rules, that don't need all stages. I would argue for removing this feature:

Most likely, you are executing more than one rule - so, the more rules you have in the ruleset, the higher the chances are, that you need all stages. So, the performance benefit would only be visible in unit tests, where you might execute rule by rule.
It's error prone to declare for each rule, which stages are needed. And as you pointed out: adding a new stage might result in changing the ruleset schema (if we want some type safety down to this and not just strings).
You only get the full power of analysis, if you make use of all stages - so, I'd rather enable all stages by default always. If we have performance issues, we should work on improving the processing and not removing some stages, that we might not need for a specific rule, but the stage would be enabled anyways, because a different rule in the ruleset needs it...

The changes you proposed make sense. I'll look into the code now. Thanks!

adangel · 2018-11-20T07:59:54Z

pmd-core/src/main/java/net/sourceforge/pmd/Rule.java

+     * @since 7.0.0
+     */
+    @Experimental
+    default boolean dependsOn(AstProcessingStage<?> stage) {


As described in my general comment, I'd argue for not introducing this feature to enable/disable specific stages...

adangel · 2018-11-20T08:00:59Z

pmd-core/src/main/java/net/sourceforge/pmd/SourceCodeProcessor.java

+        // basically:
+        // 1. make the union of all stage dependencies of each rule, by language, for the Rulesets
+        // 2. order them by dependency
+        // 3. run them and time them if needed


This would be simplified to just: "run them all and time them"

This is enticing in theory, but I think we should first address the performance issues, and then enable it by default

adangel · 2018-11-20T08:04:19Z

pmd-core/src/main/java/net/sourceforge/pmd/SourceCodeProcessor.java

+        // The approach I'd like to take is either
+        // * to create a new RunnableRulesets class which is immutable, and performs all these preliminary
+        //   computations upon construction.
+        // * or to modify Ruleset and Rulesets to be immutable. This IMO is a better option because it makes


Yes, I think, immutability is the way to go. Ruleset is already immutable, isn't it? So, we would just need make Rulesets immutable - this would be then the configured rules, that are to be executed.

Since executing rules runs in multiple threads and rules do not need to be thread-safe, we'll still need to be able to clone a complete Rulesets instance.

I'd make RuleSets immutable. Deprecate mutation methods and move fordward

adangel · 2018-11-20T08:07:18Z

pmd-core/src/main/java/net/sourceforge/pmd/lang/LanguageVersionHandler.java

+     *
+     * @return VisitorStarter
+     */
+    // TODO should we deprecate? Not much use to it.


DumpFacade: It's more a tool to help during development. We could deprecated it and point to the Designer, that should be used instead to get a look at the AST....

definitely!

adangel · 2018-11-20T08:13:03Z

pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/AstProcessingStage.java

+ * after parsing is done. Each of these stages implicitly
+ * depends on the parser stage.
+ *
+ * <p>An analysis on a file goes through the following stages:


Where would you see the multifile analysis? Is this a stage as well?
For multifile analysis, we probably can only run the rules, that need it, after all files have been fully parsed. So, the collection of the data could be done in a stage, that is executed individually for each file, but I think, the rules, that make use of it, need to be executed separately, after rulechain rules and other rules have been executed... Wdyt?

I think multifile wouldn't fit neatly into this framework. I'd like it more to be something transparent to the rules.

I imagine we could have a separate class of visitors (not rules) that run after parsing and before rule application. These are declared by rules, and each gather some information relevant to the rule that declares them. This could be a stage, but the actual visitors being applied depends on the contents of the ruleset, so it probably needs special treatment...

Upon rule application the mutlifile rules query the contents of a global index, knowing what keys their data collection visitors registered and being able to access that info for all files of the analysis. Multifile rules wouldn't need an AST to run, so they should also be treated very specially.

I read a bit about IntelliJ's way of doing it, which is very different although very powerful. We could apply some of the things they do to our codebase. We can perhaps talk about it in the next meeting

Multifile rules wouldn't need an AST to run, so they should also be treated very specially.

Right, multifile rules would be very different to "normal" rules...

adangel · 2018-11-20T08:25:59Z

pmd-cpp/src/main/java/net/sourceforge/pmd/lang/cpp/CppHandler.java

-    public RuleViolationFactory getRuleViolationFactory() {
-        throw new UnsupportedOperationException("getRuleViolationFactory() is not supported for C++");
+    protected String getLanguageName() {
+        return "C++";


We should probably use here net.sourceforge.pmd.lang.cpp.CppLanguageModule.NAME. Somewhere I've read your question about why we have separate CPD and PMD language modules...
Well, for now, we use the java.util.ServiceLoader<Language> facility to locate the available languages on the classpath - we have a CPD language class net.sourceforge.pmd.cpd.Language and a PMD language class net.sourceforge.pmd.lang.Language. We could unify them and move into the class the information, whether it's CPD only or PMD only.
There was once also the discussion, whether CPD should be part of PMD or a completely separate tool. If we unify them, we merge the two tools even closer together...

The complete CppHandler class is unnecessary and can be removed with 7.0.0. See #1481.

pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/AstAnalysisConfiguration.java

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/JavaProcessingStage.java

pmd-core/src/main/java/net/sourceforge/pmd/SourceCodeProcessor.java

Create system to declare processing stages in an extensible fashion

5eb1d23

oowekyala added in:pmd-internals Affects PMD's internals is:waiting-for-review labels Nov 2, 2018

oowekyala added this to the 7.0.0 milestone Nov 2, 2018

Checkstyle

847f294

oowekyala force-pushed the extensible-ast-processing-stages branch from da64520 to 847f294 Compare November 2, 2018 07:40

oowekyala mentioned this pull request Nov 2, 2018

[RFC] [POC] - report missing classes #1371

Closed

oowekyala added 2 commits November 9, 2018 15:50

Add method to Rule interface

dd27837

fix message

938efd1

adangel reviewed Nov 20, 2018

View reviewed changes

adangel self-assigned this Nov 26, 2018

adangel removed the is:waiting-for-review label Nov 26, 2018

adangel merged commit 938efd1 into pmd:pmd/7.0.x Nov 26, 2018

oowekyala deleted the extensible-ast-processing-stages branch November 26, 2018 20:54

oowekyala mentioned this pull request Dec 3, 2018

[core] Port deprecations from the AST processing stages PR #1498

Merged

jsotuyod reviewed Jan 13, 2019

View reviewed changes

pmd-core/src/main/java/net/sourceforge/pmd/lang/ast/AstAnalysisConfiguration.java Show resolved Hide resolved

jsotuyod reviewed Jan 13, 2019

View reviewed changes

pmd-java/src/main/java/net/sourceforge/pmd/lang/java/JavaProcessingStage.java Show resolved Hide resolved

jsotuyod reviewed Jan 21, 2019

View reviewed changes

pmd-core/src/main/java/net/sourceforge/pmd/SourceCodeProcessor.java Show resolved Hide resolved

oowekyala mentioned this pull request Apr 26, 2019

[core] Wire processing stages into SourceCodeProcessor #1796

Merged

oowekyala mentioned this pull request Jul 23, 2019

[apex] Apex should only have a single RootNode #1937

Closed

oowekyala mentioned this pull request Sep 30, 2019

[core] Remove incomplete language modules #2035

Merged

2 tasks

adangel mentioned this pull request Nov 24, 2019

Data Flow code is outdated? #2124

Closed

oowekyala mentioned this pull request May 21, 2020

[core] Language properties #2518

Closed

rsoesemann mentioned this pull request Jul 28, 2020

[apex] Integrate nawforce/ApexLink to build robust Unused rule #2667

Closed

oowekyala mentioned this pull request Oct 18, 2020

[apex] Remove Apex ProjectMirror #2836

Merged

4 tasks

oowekyala mentioned this pull request Feb 25, 2022

[core] Remove processing stages #3810

Merged

4 tasks

adangel mentioned this pull request Jan 23, 2023

PMD 7 Tracking Issue #3898

Closed

55 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[core] Abstract away optional AST traversals (first step) #1426

[core] Abstract away optional AST traversals (first step) #1426

oowekyala commented Nov 2, 2018 •

edited

Loading

pmd-test commented Nov 2, 2018

adangel commented Nov 20, 2018

adangel Nov 20, 2018

adangel Nov 20, 2018

jsotuyod Jan 13, 2019

adangel Nov 20, 2018

jsotuyod Jan 13, 2019

adangel Nov 20, 2018

jsotuyod Jan 13, 2019

adangel Nov 20, 2018

oowekyala Nov 23, 2018

adangel Nov 23, 2018

adangel Nov 20, 2018

adangel Nov 26, 2018

[core] Abstract away optional AST traversals (first step) #1426

[core] Abstract away optional AST traversals (first step) #1426

Conversation

oowekyala commented Nov 2, 2018 • edited Loading

Why we need a change

What's in this PR

What's not in this PR

pmd-test commented Nov 2, 2018

adangel commented Nov 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

oowekyala commented Nov 2, 2018 •

edited

Loading