RocksDB transactions memory tracking #19313

cpjulia · 2023-06-26T09:59:27Z

Scope & Purpose

Enterprise companion PR: https://github.com/arangodb/enterprise/pull/1311

Track memory usage for write operations in AQL queries.
The memory usage will be reported in the query's peakMemoryUsage
stats return value, and also be tracked against the global metric
arangodb_aql_global_memory_usage. The memory usage of write
operations in AQL queries was previously not tracked and not accounted
for in AQL queries' own memory usage. The memory used for write operations
also was not accounted for when setting a memory limit for AQL queries.
The memory used for write operations is now properly tracked, including in-memory
write batches and locked keys in RocksDB (for conflict checking).
With that change, the memory usage reported for AQL queries that perform
write operations may be significantly higher than before, but the values
now will reflect the reality a lot better. This has the consequence that
previously working AQL write queries can now fail because their actual memory
usage hits the configured per-query or global memory limit threshold.
In these cases it is useful to either revisit the per-query and/or global
memory limit settings, turn on intermediate commits for the query, or adjust
the query so that it uses less memory.
Track memory usage for write transactions that are not AQL queries.
The memory usage of non-AQL write operations will be tracked in the new
metrics arangodb_transactions_rest_memory_usage and
arangodb_transactions_internal_memory_usage.
The former metric contains the memory usage of user-initiated write operations,
e.g. single or multi-document writes via the document API, JavaScript transactions,
streaming transactions or index creation. The latter metric contains the
memory usage of internal operations, such as background statistics gathering,
TTL index maintenance, replication etc.
Memory usage updates will be reported to these metrics whenever a write
operation increases or decreases its memory usage by 8kb or more. We do
this to save lots of small updates of the metrics values, which would be very
contended otherwise.
Track memory usage of index creation operations. Writes to indexes are
performed in in-memory write batches until they are written to storage. The
memory usage of these in-memory write batches was previously untracked.
To make changes more testable, each write operation in the code is tagged with
a transaction::OperationOrigin object. This categorizes a write operation into
either AQL, REST, or internal. The OperationOrigin object is created whenever
a transaction::Context is created first. The context will be responsible for returning
the original OperationOrigin even for nested operations, e.g. if an AQL query
(OperationOrigin: AQL) is started inside a streaming transaction (OperationOrigin: REST),
the returned origin for the nested AQL query will still be REST.
There is an internal API (not publicly announced nor supported) for fetching the
meta data of the most recent write operations, e.g. their origins and their memory
usage. This API is used for testing only. It is only available in maintainer mode and
deactivated in release mode. There is no plan to make this API available as a feature.

💩 Bugfix
🍕 New feature
🔥 Performance improvement
🔨 Refactoring/simplification

Checklist

Related Information

Docs PR:
Enterprise PR: https://github.com/arangodb/enterprise/pull/1311
GitHub issue / Jira ticket:
Design document:

… all places

…e-664-666

arangod/Aql/ClusterQuery.cpp

arangod/Aql/RestAqlHandler.cpp

arangod/Replication/TailingSyncer.cpp

…e-664-666

…ication in tests

…e-664-666

…ry-memory-tracking

…e-664-666

…rangodb into feature/fuse-664-666

…e-664-666

cla-bot · 2023-08-28T11:18:40Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

cla-bot · 2023-08-31T10:11:28Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

…e-664-666

cla-bot · 2023-08-31T14:31:22Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

cla-bot · 2023-08-31T15:41:57Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

arangod/V8Server/v8-vocbase.cpp

arangod/VocBase/Methods/AqlUserFunctions.cpp

…e-664-666

…nto feature/fuse-664-666

cla-bot · 2023-09-04T08:43:33Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

neunhoef

LGTM so far. See one more comment.
I will now do another experiment and compare the new metrics with the memory profiling.

CHANGELOG

cla-bot · 2023-09-04T11:23:09Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

…e-664-666

…nto feature/fuse-664-666

cla-bot · 2023-09-04T13:16:09Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

neunhoef · 2023-09-04T13:19:42Z

I have done a quick check using the following tortureTrx.js script:

var nrTrx = 100;
var c = db._create("c");
var internal = require("internal");
var l = [];
for (let i = 0; i < 1000; ++i) {
  l.push({Hallo:i, s: internal.genRandomAlphaNumbers(100)});
}
var t = [];
var cc = [];
for (let i = 0; i < nrTrx; ++i) {
  t.push(db._createTransaction({collections:{write:["c"]}}));
  cc.push(t[i].collection(c));
  print(i, "generated", new Date());
}
for (let j = 0; j < 15; ++j) {
  for (let i = 0; i < nrTrx; ++i) {
    cc[i].insert(l);
    print("j=", j, "i=", i, new Date());
  }
}
print("Preparing to commit...10s...");
require("internal").wait(10);
print("Committing...");
for (let i = 0; i < nrTrx; ++i) {
  t[i].commit();
  print(i, new Date());
}

This opens a large amount of transactions and keeps them busy for a while. I have then checked the output of the new metrics and compared it with the a heap difference profile between "before" and "afterwards". So far, there still seems to be a discrepancy. We will investigate this tomorrow.

…e-664-666

neunhoef

LGTM.
I ran examples of large and many ongoing transactions and found the following:

When I compared the actual memory usage (heap profiling) to the one reported by the new transaction memory use metric, there were a few discrepancies:

The exact memory accounting is still a bit off (not surprising, because we rely on estimates about the exact behaviour of the RocksDB internal data structures). In particular, the memory used by the PointLockManager is still underestimated.
Furthermore, I found that the PointLockManager does not return all its memory after a huge amount of points were locked (for some large transactions). This is, because an absl::flat_hash_map is used which does not automatically shrink, even if all locks are released. We should fix this behaviour, since it can use to a large overuse of RAM.

However, I suggest that we fine tune this in a separate PR to get this one out of the way.
The rest is only fine tuning.

MBkkt · 2023-09-05T11:24:20Z

@neunhoef

This is, because an absl::flat_hash_map is used which does not automatically shrink

std::unordered_map also doesn't shrink automatically.
If we need automatically release memory for nodes (same as std::unordered_map) it's possible to use absl::node_hash_map

* Added startup option `--transaction.streaming-max-transaction-size` to make the maximum size of streaming transactions configurable. Previously the maximum size of streaming transactions was hard-coded to 128 MiB. This value is also used as default value for the option now.

cla-bot · 2023-09-05T11:31:26Z

Fresh avocado detected! Welcome and thank you for your contribution @cpjulia. My avocado-loving overlords have decreed a signed CLA is required for PRs. Please see https://github.com/arangodb/arangodb/blob/devel/CONTRIBUTING.md file to learn more or ask @cw00dw0rd if you have issues.

neunhoef · 2023-09-05T12:18:22Z

Latest changes: LGTM.

* Added transaction memory tracking with rocksdb memory tracking * Intermediate commit (not compiling) for transaction classification in all places * Added distinction to transaction types everywhere * Removed git markers * Removed DECLARE_GAUGE unused * Updated transaction classification for arangod * Intermediate commit for tests and documentation * Added extension to files * Intermediate commit for separating transaction type hints and classification in tests * Intermediate commit for transaction classification in tests * Intermediate commit for transaction classification in tests * Addressed suggestions and intermediate commit for tests * wire memory tracking to AQL queries * write CHANGELOG entry * Added INTERNAL classification for transactions in tests * make arangodbtests compile * refactor * make PR surface smaller * reduce PR surface * fix metrics documentation * update branch and track memory usage * added metrics descriptions * slightly simplify test * fix compilation errors * fix transaction memory usage stats * refactor code * implement buffering for AQL transaction metrics updates * simplify PR * clean up metrics * add memory accounting for copied write batches * larger refactoring * small refactoring * bigger refactoring * remove unknowns * remove unknown origin * expose memory usage for testing * refactoring and test * fix compilation of tests * lint * refactoring * fix test * refactoring * fix tests * remove LOG_DEVELs * updated CHANGELOG * added code comments * add some tests * add test * added tests for memory usage tracking * track memory usage during index creation * fix linting * track indexing memory usage for unique indexes * fix compile error * Added startup option `--transaction.streaming-max-transaction-size` * Added startup option `--transaction.streaming-max-transaction-size` to make the maximum size of streaming transactions configurable. Previously the maximum size of streaming transactions was hard-coded to 128 MiB. This value is also used as default value for the option now. * fix CHANGELOG discrepancy --------- Co-authored-by: jsteemann <jsteemann@users.noreply.github.com> Co-authored-by: CryptoNinjaGeek <86222482+CryptoNinjaGeek@users.noreply.github.com>

cpjulia added 5 commits June 22, 2023 14:24

Added transaction memory tracking with rocksdb memory tracking

0a3e56f

Intermediate commit (not compiling) for transaction classification in…

2328a62

… all places

Added distinction to transaction types everywhere

db71374

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

adf2b96

…e-664-666

Removed git markers

81c0a79

cla-bot bot added the cla-signed label Jun 26, 2023

cpjulia added this to the devel milestone Jun 26, 2023

jsteemann reviewed Jun 26, 2023

View reviewed changes

arangod/Aql/ClusterQuery.cpp Outdated Show resolved Hide resolved

jsteemann reviewed Jun 26, 2023

View reviewed changes

arangod/Aql/RestAqlHandler.cpp Outdated Show resolved Hide resolved

jsteemann reviewed Jun 26, 2023

View reviewed changes

arangod/Replication/TailingSyncer.cpp Outdated Show resolved Hide resolved

cpjulia and others added 20 commits June 26, 2023 10:16

Removed DECLARE_GAUGE unused

43cddcb

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

1b4d4a4

…e-664-666

Updated transaction classification for arangod

1abba31

Intermediate commit for tests and documentation

9913f4f

Added extension to files

1247d9b

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

59a13b7

…e-664-666

Intermediate commit for separating transaction type hints and classif…

7bd5f2e

…ication in tests

Intermediate commit for transaction classification in tests

0bb4603

Intermediate commit for transaction classification in tests

6b42d87

Addressed suggestions and intermediate commit for tests

e9232ae

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

fde7cc0

…e-664-666

wire memory tracking to AQL queries

ea38607

write CHANGELOG entry

5fc9b03

Merge branch 'devel' of github.com:arangodb/arangodb into feature/que…

04aed12

…ry-memory-tracking

Merge branch 'devel' into feature/query-memory-tracking

7da96b4

Added INTERNAL classification for transactions in tests

3a1a681

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

618e4ed

…e-664-666

make arangodbtests compile

d6b58f0

Merge branch 'feature/query-memory-tracking' of github.com:arangodb/a…

653c955

…rangodb into feature/fuse-664-666

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

457438a

…e-664-666

fix compile error

6aa5371

Merge branch 'devel' into feature/fuse-664-666

f036f78

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

66876f7

…e-664-666

Merge branch 'devel' into feature/fuse-664-666

c1fde29

neunhoef reviewed Sep 4, 2023

View reviewed changes

arangod/V8Server/v8-vocbase.cpp Outdated Show resolved Hide resolved

neunhoef reviewed Sep 4, 2023

View reviewed changes

arangod/VocBase/Methods/AqlUserFunctions.cpp Show resolved Hide resolved

jsteemann added 2 commits September 4, 2023 10:31

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

a1cd2cd

…e-664-666

Merge branch 'feature/fuse-664-666' of github.com:arangodb/arangodb i…

da5fa71

…nto feature/fuse-664-666

neunhoef reviewed Sep 4, 2023

View reviewed changes

CHANGELOG Show resolved Hide resolved

Merge branch 'devel' into feature/fuse-664-666

6ddcff8

jsteemann added 2 commits September 4, 2023 15:15

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

57976cf

…e-664-666

Merge branch 'feature/fuse-664-666' of github.com:arangodb/arangodb i…

fe08fd6

…nto feature/fuse-664-666

Merge branch 'devel' of github.com:arangodb/arangodb into feature/fus…

f5080fd

…e-664-666

neunhoef approved these changes Sep 5, 2023

View reviewed changes

jsteemann added 2 commits September 5, 2023 13:29

fix CHANGELOG discrepancy

208d1d5

neunhoef merged commit a0c9ade into devel Sep 5, 2023

neunhoef deleted the feature/fuse-664-666 branch September 5, 2023 13:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RocksDB transactions memory tracking #19313

RocksDB transactions memory tracking #19313

cpjulia commented Jun 26, 2023 •

edited by jsteemann

Loading

cla-bot bot commented Aug 28, 2023

cla-bot bot commented Aug 31, 2023

cla-bot bot commented Aug 31, 2023

cla-bot bot commented Aug 31, 2023

cla-bot bot commented Sep 4, 2023

neunhoef left a comment

cla-bot bot commented Sep 4, 2023

cla-bot bot commented Sep 4, 2023

neunhoef commented Sep 4, 2023 •

edited

Loading

neunhoef left a comment

MBkkt commented Sep 5, 2023

cla-bot bot commented Sep 5, 2023

neunhoef commented Sep 5, 2023

RocksDB transactions memory tracking #19313

RocksDB transactions memory tracking #19313

Conversation

cpjulia commented Jun 26, 2023 • edited by jsteemann Loading

Scope & Purpose

Checklist

Related Information

cla-bot bot commented Aug 28, 2023

cla-bot bot commented Aug 31, 2023

cla-bot bot commented Aug 31, 2023

cla-bot bot commented Aug 31, 2023

cla-bot bot commented Sep 4, 2023

neunhoef left a comment

Choose a reason for hiding this comment

cla-bot bot commented Sep 4, 2023

cla-bot bot commented Sep 4, 2023

neunhoef commented Sep 4, 2023 • edited Loading

neunhoef left a comment

Choose a reason for hiding this comment

MBkkt commented Sep 5, 2023

cla-bot bot commented Sep 5, 2023

neunhoef commented Sep 5, 2023

cpjulia commented Jun 26, 2023 •

edited by jsteemann

Loading

neunhoef commented Sep 4, 2023 •

edited

Loading