Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/distribute simplification #13509

Merged
merged 18 commits into from
Feb 9, 2021
Merged

Conversation

mpoeter
Copy link
Contributor

@mpoeter mpoeter commented Feb 8, 2021

Scope & Purpose

Enterprise companion PR: arangodb/enterprise#647

Simplify the DistributeExecutor and avoid implicit modification of its input variable.

Previously the DistributeExecutor sometimes updated the input variable in-place, leading to unexpected results like in this example:

FOR i IN 1..3 LET x = { test: 1 } INSERT x INTO testi RETURN x

Execution plan:
 Id   NodeType            Site  Est.   Comment
  1   SingletonNode       COOR     1   * ROOT
  2   CalculationNode     COOR     1     - LET #3 = 1 .. 3   /* range */   /* simple expression */
  4   CalculationNode     COOR     1     - LET x = { "test" : 1 }   /* json expression */   /* const assignment */
  3   EnumerateListNode   COOR     3     - FOR i IN #3   /* list iteration */
  9   DistributeNode      COOR     3       - DISTRIBUTE  /* create keys: true, variable: x */
 10   RemoteNode          DBS      3       - REMOTE
  6   InsertNode          DBS      3       - INSERT x IN testi 
 11   RemoteNode          COOR     3       - REMOTE
 12   GatherNode          COOR     3       - GATHER   /* unsorted */
  8   ReturnNode          COOR     3       - RETURN x

This query should return {test: 1} three times. Instead it returns:

[ 
  { "test" : 1, "_key" : "2010078" }, 
  { "test" : 1, "_key" : "2010079" }, 
  { "test" : 1, "_key" : "2010080" } 
]

This PR moves the modification logic from the DistributeExecutor into three new internal AQL functions (MAKE_DISTRIBUTE_INPUT, MAKE_DISTRIBUTE_INPUT_WITH_KEY_CREATION, MAKE_DISTRIBUTE_GRAPH_INPUT). As a post-processing step after the optimization, we insert a new calculation node with the corresponding function call for each distribute node in the plan (if necessary). This is done in a post-processing step so that the calculation node does not interfere with any optimization rules operating on the distribute node and its variable,

This change not only simplifies the DistributeExecutor, but also makes any additional calculation (if necessary) explicit and avoids unexpected results like in the previous example.

  • 💩 Bugfix (requires CHANGELOG entry)
  • 🔨 Refactoring/simplification
  • 📖 CHANGELOG entry made

Backports:

  • No backports required

Related Information

  • Main repository PR: arangodb/enterprise#647

Testing & Verification

  • This change is already covered by existing tests, such as shell_server_aql.
  • This PR adds tests that were used to verify all changes:
    • Added new integration tests shell_server_aql

@mpoeter mpoeter added this to the devel milestone Feb 8, 2021
@mpoeter mpoeter marked this pull request as ready for review February 8, 2021 15:58
@mpoeter
Copy link
Contributor Author

mpoeter commented Feb 8, 2021

@mpoeter mpoeter requested review from mchacki and jsteemann February 8, 2021 16:39
@mpoeter
Copy link
Contributor Author

mpoeter commented Feb 8, 2021

@mpoeter
Copy link
Contributor Author

mpoeter commented Feb 9, 2021

@mpoeter
Copy link
Contributor Author

mpoeter commented Feb 9, 2021

Tests blue

CHANGELOG Show resolved Hide resolved
Comment on lines -347 to +328
builder.add("createKeys", VPackValue(_createKeys));
builder.add("allowKeyConversionToObject", VPackValue(_allowKeyConversionToObject));
builder.add("fixupGraphInput", VPackValue(_fixupGraphInput));
builder.add(VPackValue("variable"));
_variable->toVelocyPack(builder);
builder.add(VPackValue("alternativeVariable"));
_alternativeVariable->toVelocyPack(builder);
_variable->toVelocyPack(builder);;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this doesn't break in case of rolling upgrades in the cluster, as coordinators are updated last.
However, it could break an older version of the explainer, which may check any of these attributes on a DistributeNode. So we should prepare the explainer in 3.7 to handle the new DistributeNode type gracefully.

arangod/Aql/Functions.cpp Outdated Show resolved Hide resolved
Comment on lines 8773 to 8774
// TODO - use ignoreErrors in all error cases?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do anything here? That is unclear to me as a reader of the code.

Comment on lines +8326 to +8328
setInVariable = [updateReplaceNode](Variable* var) {
updateReplaceNode->setInDocVariable(var);
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell, the value of setInVariable is the same for the if and the else parts, so it can be moved behind them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, in one case we call setInKeyVariable, in the other setInDocVariable.

@mpoeter
Copy link
Contributor Author

mpoeter commented Feb 9, 2021

arangod/Aql/Functions.cpp Outdated Show resolved Hide resolved
@jsteemann jsteemann merged commit 846ce5f into devel Feb 9, 2021
@jsteemann jsteemann deleted the feature/distribute-simplification branch February 9, 2021 15:37
elfringham pushed a commit to elfringham/arangodb that referenced this pull request Apr 20, 2021
Co-authored-by: jsteemann <jan@arangodb.com>
Co-authored-by: Jan <jsteemann@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants