Feature/distribute simplification #13509

mpoeter · 2021-02-08T13:33:33Z

Scope & Purpose

Enterprise companion PR: arangodb/enterprise#647

Simplify the DistributeExecutor and avoid implicit modification of its input variable.

Previously the DistributeExecutor sometimes updated the input variable in-place, leading to unexpected results like in this example:

FOR i IN 1..3 LET x = { test: 1 } INSERT x INTO testi RETURN x

Execution plan:
 Id   NodeType            Site  Est.   Comment
  1   SingletonNode       COOR     1   * ROOT
  2   CalculationNode     COOR     1     - LET #3 = 1 .. 3   /* range */   /* simple expression */
  4   CalculationNode     COOR     1     - LET x = { "test" : 1 }   /* json expression */   /* const assignment */
  3   EnumerateListNode   COOR     3     - FOR i IN #3   /* list iteration */
  9   DistributeNode      COOR     3       - DISTRIBUTE  /* create keys: true, variable: x */
 10   RemoteNode          DBS      3       - REMOTE
  6   InsertNode          DBS      3       - INSERT x IN testi 
 11   RemoteNode          COOR     3       - REMOTE
 12   GatherNode          COOR     3       - GATHER   /* unsorted */
  8   ReturnNode          COOR     3       - RETURN x

This query should return {test: 1} three times. Instead it returns:

[ 
  { "test" : 1, "_key" : "2010078" }, 
  { "test" : 1, "_key" : "2010079" }, 
  { "test" : 1, "_key" : "2010080" } 
]

This PR moves the modification logic from the DistributeExecutor into three new internal AQL functions (MAKE_DISTRIBUTE_INPUT, MAKE_DISTRIBUTE_INPUT_WITH_KEY_CREATION, MAKE_DISTRIBUTE_GRAPH_INPUT). As a post-processing step after the optimization, we insert a new calculation node with the corresponding function call for each distribute node in the plan (if necessary). This is done in a post-processing step so that the calculation node does not interfere with any optimization rules operating on the distribute node and its variable,

This change not only simplifies the DistributeExecutor, but also makes any additional calculation (if necessary) explicit and avoids unexpected results like in the previous example.

💩 Bugfix (requires CHANGELOG entry)
🔨 Refactoring/simplification
📖 CHANGELOG entry made

Backports:

No backports required

Related Information

Main repository PR: arangodb/enterprise#647

Testing & Verification

This change is already covered by existing tests, such as shell_server_aql.
This PR adds tests that were used to verify all changes:
- Added new integration tests shell_server_aql

…implification # Conflicts: # arangod/Aql/ClusterNodes.h

…implification # Conflicts: # arangod/Aql/AqlFunctionFeature.cpp

…function call to prepare the input.

…implification

mpoeter · 2021-02-08T16:39:01Z

http://172.16.10.101:8080/view/PR/job/arangodb-matrix-pr/13897/

…implification # Conflicts: # CHANGELOG

mpoeter · 2021-02-08T18:23:35Z

http://172.16.10.101:8080/view/PR/job/arangodb-matrix-pr/13900/

…implification # Conflicts: # CHANGELOG

mpoeter · 2021-02-09T09:12:53Z

http://172.16.10.101:8080/view/PR/job/arangodb-matrix-pr/13902/

mpoeter · 2021-02-09T11:27:31Z

Tests blue

CHANGELOG

jsteemann · 2021-02-09T12:01:01Z

arangod/Aql/ClusterNodes.cpp

-  builder.add("createKeys", VPackValue(_createKeys));
-  builder.add("allowKeyConversionToObject", VPackValue(_allowKeyConversionToObject));
-  builder.add("fixupGraphInput", VPackValue(_fixupGraphInput));
  builder.add(VPackValue("variable"));
-  _variable->toVelocyPack(builder);
-  builder.add(VPackValue("alternativeVariable"));
-  _alternativeVariable->toVelocyPack(builder);
+  _variable->toVelocyPack(builder);;


I guess this doesn't break in case of rolling upgrades in the cluster, as coordinators are updated last.
However, it could break an older version of the explainer, which may check any of these attributes on a DistributeNode. So we should prepare the explainer in 3.7 to handle the new DistributeNode type gracefully.

arangod/Aql/DistributeExecutor.cpp

arangod/Aql/Functions.cpp

jsteemann · 2021-02-09T12:13:03Z

arangod/Aql/Functions.cpp

+  // TODO - use ignoreErrors in all error cases?
+


Do we need to do anything here? That is unclear to me as a reader of the code.

jsteemann · 2021-02-09T12:20:51Z

arangod/Aql/OptimizerRules.cpp

+          setInVariable = [updateReplaceNode](Variable* var) {
+            updateReplaceNode->setInDocVariable(var);
+          };


As far as I can tell, the value of setInVariable is the same for the if and the else parts, so it can be moved behind them.

No, in one case we call setInKeyVariable, in the other setInDocVariable.

js/common/modules/@arangodb/aql/explainer.js

mpoeter · 2021-02-09T13:41:26Z

http://172.16.10.101:8080/view/PR/job/arangodb-matrix-pr/13904/

arangod/Aql/Functions.cpp

Co-authored-by: jsteemann <jan@arangodb.com> Co-authored-by: Jan <jsteemann@users.noreply.github.com>

jsteemann and others added 11 commits January 28, 2021 01:35

experimental refactoring

c72b0be

Add internal MAKE_DISTRIBUTE_INPUT functions.

73fa8d6

Merge remote-tracking branch 'origin/devel' into feature/distribute-s…

d5b2ac8

…implification # Conflicts: # arangod/Aql/ClusterNodes.h

Merge remote-tracking branch 'origin/devel' into feature/distribute-s…

b6269be

…implification # Conflicts: # arangod/Aql/AqlFunctionFeature.cpp

Simplify DistributeExecutor and instead insert MAKE_DISTRIBUTE_INPUT …

05bf094

…function call to prepare the input.

Introduce targetNodeId.

bf60207

Refactor and simplify.

6990867

Merge remote-tracking branch 'origin/devel' into feature/distribute-s…

d84ecbe

…implification

Add tests.

a853bf8

Merge remote-tracking branch 'origin/devel' into feature/distribute-s…

506a521

…implification

Update CHANGELOG.

b8f918b

mpoeter added this to the devel milestone Feb 8, 2021

mpoeter marked this pull request as ready for review February 8, 2021 15:58

mpoeter requested review from mchacki and jsteemann February 8, 2021 16:39

mpoeter added 2 commits February 8, 2021 19:20

Fix jslint errors.

19e68ca

Merge remote-tracking branch 'origin/devel' into feature/distribute-s…

26fb3cf

…implification # Conflicts: # CHANGELOG

mpoeter added 2 commits February 9, 2021 10:10

Remove unused variable.

504bcf4

Merge remote-tracking branch 'origin/devel' into feature/distribute-s…

b8092ce

…implification # Conflicts: # CHANGELOG